کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
495773 862837 2014 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
An approach for selecting seed URLs of focused crawler based on user-interest ontology
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
An approach for selecting seed URLs of focused crawler based on user-interest ontology
چکیده انگلیسی


• The user-interest ontology construction is proposed by using user log profile. We describe three steps of ontology construction approach: concept selection, the generation of optimized concept lattice, the process of translating concept lattice into user-interest ontology. It is worth to be mentioned that we can revise and update the user-interest ontology using the optimized concept lattice by FCA. The more time the user uses search engine, the richer knowledge the user-interest ontology contains.
• Based on user-interest ontology, we propose the seed URLs selection approach. The user feature concept vectors, the semantic Web pages, the base set of HITS, bipartite directed graph, and the complete bipartite directed graph are discussed for the seed URLs selection based on user-interest ontology. The advantage of the approach is combing the semantic content and link information of Web pages.
• We complete our experiments on simulating Web environment and real Web environment, it prove that our proposed seed URLs selection approach outperform HITS approach and random approach.

Seed URLs selection for focused Web crawler intends to guide related and valuable information that meets a user's personal information requirement and provide more effective information retrieval. In this paper, we propose a seed URLs selection approach based on user-interest ontology. In order to enrich semantic query, we first intend to apply Formal Concept Analysis to construct user-interest concept lattice with user log profile. By using concept lattice merger, we construct the user-interest ontology which can describe the implicit concepts and relationships between them more appropriately for semantic representation and query match. On the other hand, we make full use of the user-interest ontology for extracting the user interest topic area and expanding user queries to receive the most related pages as seed URLs, which is an entrance of the focused crawler. In particular, we focus on how to refine the user topic area using the bipartite directed graph. The experiment proves that the user-interest ontology can be achieved effectively by merging concept lattices and that our proposed approach can select high quality seed URLs collection and improve the average precision of focused Web crawler.

Figure optionsDownload as PowerPoint slide

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Applied Soft Computing - Volume 14, Part C, January 2014, Pages 663–676
نویسندگان
, , , ,