Web information recuperation from strewn text resource systems

Agrawal, Anil; Husain, Mohamed; Tiwari, Raj Gaurang; Vishwakarma, Suneel

SAFETYLIT WEEKLY UPDATE

We compile citations and summaries of about 400 new articles every week.

RSS Feed

HELP: Tutorials | FAQ

CONTACT US: Contact info

Search Results

Journal Article

Web information recuperation from strewn text resource systems
Citation	Agrawal A, Husain M, Tiwari RG, Vishwakarma S. Int. J. Adv. Eng. Technol. 2011; 1(2): 126-137.
Copyright	(Copyright © 2011, IAET Publishing House)
DOI	unavailable
PMID	unavailable
Abstract	The Internet has become a vast information source in recent years and can be considered as the world's largest digital library. To help ordinary users find desired data in this library, many search engines have been created. Each search engine has a corresponding database that defines the set of documents that can be searched by the search engine. Usually, an index for all documents in the database is created and stored in the search engine. For each term which can represent a significant word or a combination of several (usually adjacent) significant words, this index can identify the documents that contain the term quickly. Frequently, the information needed by a user is stored in the databases of multiple search engines. As an example, consider the case when a user wants to find papers in a subject area. It is likely that the desired papers are scattered in a number of publishers and/or universities databases. Text data in the Internet can be partitioned into many databases naturally. Efficient retrieval of desired data can be achieved if we can accurately predict the usefulness of each database, because with such information, we only need to retrieve potentially useful documents from useful databases. For a given query 'q' the usefulness of a text database is defined to be the no. of documents in the database that are sufficiently relevant to the query 'q'. In this paper we propose new approaches for database selection and documents selection. In the first part of our work we present an algorithm DBSEL for database selection. This algorithm selects those databases from no. of databases which contain query 'q'. This algorithm test each database with its documents stored in it. If any document of database contains the query 'q' at least one time then we select that database. If all the documents of database does not contains the query 'q' then that database will not be selected. In the second part of our work we present an algorithm HighRelDoc for documents selection. This algorithm search all the selected databases and select only those documents from each database in which the query 'q' occurs at least one time. After that this algorithm ranks all the selected documents according to the no. of occurrence of query 'q' in descending order. Finally this algorithm returns the top 'n' most relevant documents from the sorted list of documents for any positive integer 'n'.