Web Site

Internet-description.com



» Internet » World Wide Web » Website » Search machine


Page modified: Saturday, June 24, 2006 10:36:50

A search machine is a program for the search of documents, which are stored like the World Wide Web e.g. in a computer or a computer network. After input of a search word a search machine supplies a list represented of references to possibly relevant documents, mostly with title and a short excerpt of the respective document. Different search methods application can find.

The substantial components and/or fields of a search machine are

  • Production and care of an index (data structure with information about documents),
  • Process from retrieval queries (finding and arranging results) as well as
  • Dressing of the results in as meaningful a form as possible.

Usually the data procurement takes place automatically, in the WWW via Webcrawler, on an individual computer via regular reading of all files in in listings specified by the user in the local file system.

Kinds of search machines

Search machines can be categorized after a set of characteristics. The three following characteristics are orthogonal to each other. One can decide with the draft of a search machine thus for a possibility out each of the three groups of characteristics, independently of the other characteristics. The most usual and usually-used combination is a index-based (realization) Web search machine (data source) on HTML text documents (kind of the data), like it among other things from the three large search machine offerers Google, Yahoo! Search and MSN search are made available.

Kind of the data

Different search machines can scan different kinds of data. First these can be divided roughly into "„document types "“like text, picture, clay/tone, video and other one. Result pages are arranged as a function of this kind. With a search for text documents a text fragment is usually indicated, which contains the search words. Picture search machines indicate a miniature opinion of the suitable pictures.

A further finer breakdown deals with dataspecific characteristics, which divide not all documents within a kind. If one remains with the example text, then can be searched with Usenet contributions for certain authors, with web pages in the HTML format for the document title.

Depending upon data kind a restriction is possible on a subset of all data of a kind as the further function. This is realized generally over additional search parameters, which exclude a part of the seized data. Alternatively a search machine can be limited to take up from the outset only suitable documents. Examples are for instance a search machine for Weblogs (instead of for the complete Web) or search machines, which process only documents of universities, or excluding documents from a certain country, in a certain language or a certain file format.

Data source

A further characteristic for categorization is the source, from which the data seized by the search machine originate. The name of the kind of search machine mostly already describes the source.

Web search machines seize documents from the World Wide Web, Usenet search machines of contributions from the discussion medium Usenet distributed world-wide. Intranet search machines are limited to the computers of the Intranets of a company. Desktopsuchmaschinen recently programs are called, which the local volume of data of an individual computer to make scanable.

If the data procurement is made manually by means of registration or by lectors, one speaks of a catalog or a listing. In such listings like the open the documents are hierarchically organized directory Project in a table of contents after topics.

Realization

This section describes differences in the realization of the enterprise of the search machine.

  • The nowadays most important group are index-based search machines. These read suitable documents in and put on an index. It concerns a data structure, which is used with a later retrieval query. Disadvantage is the complex care and storage of the index, advantage is the acceleration of the search procedure.
  • Metasuchmaschinen send retrieval queries parallel to several index-based search machines and combine the single results. As advantage the larger data set as well as the simpler implementation result, since no index must be reproached. Disadvantage is those relatively long duration of the enquiry processing. In addition the Ranking is by pure majority identification of doubtful importance. Perhaps the quality of the results is reduced to the quality of the worst underlying search machine. Metasuchmaschinen are particularly meaningful with rarely occurring search words.
  • Further hybrid forms exist. These possess their own, often relatively small index, ask in addition, other search machines and combine finally the single results. So-called real time search machines start for instance the indexation procedure only after an inquiry. Like that the found sides are always current, the quality of the results are however due to the missing broad database in particular with less usual search words bad.
  • A relatively new beginning are distributed search machines
. A retrieval query is passed on to a multiplicity of individual computers, which in each case operate their own search machine, and which results united. Advantage is the high reliability due to the decentralization and - depending upon aspect - the missing possibility of censoring central. With difficulty to solve however the Ranking is, thus the assortment of the documents in principle fitting after its relevance for the inquiry.

Assortment of the results

The representation of the search results happens sorted according to relevance (Ranking and/or search rank), for which each search machine consults its own, mostly criteria secretly held. In addition belong:

  • The fundamental meaning of a document (with Google the PAGE climbing value).
  • Frequency and position of the search words in the respective found document.
  • Classification and number of quoted documents.
  • Frequency text contained of references of other documents to in the search result contained the document as well as in references.
  • Classification of the quality of the referring documents (a left of one "„is more worth good "“document than the reference of a moderate document).

Some search machines sort search results not only according to relevance for the retrieval query, but permit against payment also influencing control on their expenditure. In the last years however a separation between search results and as "„paid hits has itself "“marked faded in advertisement interspersed with the large offerers, which is cut to the retrieval query.


Related Websites

We found here 6 related websites.

Page cached: Wednesday, July 5, 2006 23:53:24
Valid XHTML 1.0!  Valid CSS!

Navigation

Related articles


Page copy protected against web site content infringement by Copyscape