Article ID Journal Published Year Pages File Type
487402 Procedia Computer Science 2015 7 Pages PDF
Abstract

Web log data available at server side helps in identifying the most appropriate pages based on the user request. Analysis of web log data poses challenges as it consists of abundant information of a web page. In this paper a novel technique has been proposed to pre-process the web log data to extract sequence of occurrence and navigation patterns helpful for prediction. Each URL in the web log data is parsed into tokens based on the web structure. Tokens are uniquely identified for the classification of URLs. The sequence of URLs navigated by a user for a period of 30 minutes is treated as a session. Session represents the navigation pattern of a user. Sessions from multiple users are clustered using hierarchical agglomerative clustering technique to analyze the occurrence of sequence in the navigation patterns. From each cluster, a session is identified as a representative as it holds most possible pages in the sequence, other sessions in the cluster are the subset of the representative session. Session representative navigation patterns are useful for predicting the most appropriate pages for the user request. The proposed model is tested on web log files of NASA and enggresources.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)