Article ID Journal Published Year Pages File Type
1099720 Library & Information Science Research 2008 8 Pages PDF
Abstract

Large sets of Web page links, colinks, or URLs sometimes need to be counted or otherwise summarized by researchers to analyze Web growth or publishing. Computing professionals also use them to evaluate Web sites or optimize search engines. Despite the apparently simple nature of these types of data, many different summarization methods have been used in the past. Some of these methods may not have been optimal. This article proposes a generic lexical framework to unify and extend existing methods through abstract notions of link lists and URL lists. The approach is built upon decomposing URLs by lexical segments, such as domain names, and systematically characterizing the counting options available. In addition, counting method choice recommendations are inferred from a very general set of theoretical research assumptions. The article also offers practical advice for analyzing raw data from search engines.

Related Topics
Social Sciences and Humanities Social Sciences Library and Information Sciences
Authors
, ,