Web Site

Internet-description.com



» Internet » World Wide Web » Topics begins with D » Deep Web


Page modified: Saturday, June 24, 2006 10:37:29

The Deep Web (also Hidden Web or invisible Web) and/or hidden Web designates the part of the Internet, which is not discoverable over normal search machines with a search. Contrary to the Deep Web are called over search machines accessible web pages Visible Web (visible Web) or Surface the Web (surface Web). The Deep Web consists to large parts of topic-specific data bases (special data banks) and web pages, which are only generated dynamically by inquiries from data bases. Roughly the Deep Web can be differentiated into "“contents of, those not freely accessible is"” and "“contents, which are not indicated by search machines"”. The size of the Deep Web can become only estimated - one assumes it covers a multiple of the directly accessible Webs. Search machines are constantly developed further, therefore web pages, which belonged yesterday still to the Deep Web, can be today already part of the

Characteristics

According to a study (miner 2001) of the company BrightPlanet Web the following characteristics result for the Deep:

The data set of the Deep Web is about 400 to 550 times more largely than those of the Surface Web. However 60 the largest Deep Web sides contains about 750 Terabyte at information, which exceeds the quantity of the Surface Web around the factor 40. Allegedly more than 200,000 Deep Websites exist. So web pages from the Deep Web have 50% more access per month, and are more frequently linked on the average than web pages from the Surface Web according to the study. The Deep Web is also the fastest increasing category of new information in the Web. Nevertheless the public searching in the Internet the normal Deep web page is hardly well-known. More than half of the Deep Web is settled in topic-specific data bases. More than 95% of the Deep Web are accessible free of charge.

Since Bright offers a commercial search assistance to planet with DQM2, the strongly overrated statement of size with large caution is to be enjoyed. The estimated data set of the Deep Web must be settled around some data:

  • Doublets from library catalogs overlap
  • Data collection national of the Climatic DATA center (370,000 GByte)
  • Data of NASA (220,000 GByte)
  • further data collections (national Oceanographic DATA centers & national Geophysical DATA center, right ton know network, Alexa,"…)

On the basis the number of data records it shows up that the study overrates the size of the Deep Web tenfold. However the information supplier LexisNexis has with 4.6 billion data records more than half of the number of data records of the Suchmaschinenprimus Google. The Deep Web is surely by far larger therefore than the surface Web.

In an investigation of the University OF California, Berkeley from the year 2003 the following values were determined as extent of the Internets: Surface Web - 167 Terabyte, Deep Web - 91,850 Terabyte. The printed existence of the LIBRARY OF Congress in Washington, the largest library of the world, cover 10 Terabyte.

Kinds of the Deep Web

According to Sherman & Price (2001) five types invisible Web are differentiated: "„Opaque Web "“, "„private Web "“, "„Proprietary Web "“, "„invisible Web "“and "„Truly invisible Web "“.

Opaque Web

The Opaque Web (English obscurely) are not web pages, which could be indicated, to time however for reasons of the performance or expenditure use relation to be indicated (search depth, attendance frequency).

Search machines consider not all listing levels and lower surfaces of a web page. When seizing web pages Webcrawler steer over left to the following web pages. Webcrawler cannot navigate, get lost even in deep listing structures, not seize and back to the starting side not find sides. For this reason search machines consider often at the most five or six listing levels. Extensive and thus relevant documents can lie in deeper hierarchy levels and not find because of the limited development depth of search machines.

In addition file formats come, to be only partly seized can (e.g. pdf, Google indicates in each case the first 120 KB - about 100,000 text characters - of a pdf file).

There is a dependence on the frequency of the indexing of a web page (daily, monthly). In addition constantly updated volume of data, measuring data, real time data is concerned. Web pages without hyper+on the left of or Navigationsystem, unverlinkte web pages, Einsiedler URLs or Orphan sides (English orphan) fall likewise under it.

Private Web

The private Web describes web pages, which could be indicated, but due to of restrictions of entrance of the Webmasters not to be indicated.

This web pages can be in the Intranet (internal web page), in addition, password-protected data (registration and password and Login), entrance only for certain IP addresses, protection from an indexing by the Robots Exclusion Standardoder protection from an indexing by the Meta day values noindex, nofollow and noimageindex in the source text of the web page.

Proprietary Web

With Proprietary Web are meant web pages, which could be indicated, which are accessible however only after acknowledgment of a use condition (free of charge or liable to pay the costs).

Such web pages are usually only callable after an identification (Web-based special data banks).

Invisible Web

Under the invisible Web do not fall web pages, which could be indicated, however for commercial or strategic reasons to be indicated - like for example data bases with a form for Web.

Truly invisible Web

With Truly invisible Web are marked web pages, which cannot be indicated. That data base formats can be, which developed before the WWW (some host), documents, which cannot be indicated directly in the Browser, non--standard formats exactly the same as file formats, which cannot be seized (for example Flash and diagram formats). In addition compressed data come, or web pages, which only over a user navigation, which uses diagrams (image map) or Scripte (Frames), to serve are. In addition there are neglected data also from search machines intentionally.


Articles in category "Deep Web"

We found here 10 articles.

D

» Dead left
» Dedicated server
» Deep left
» Deep Web
» Digg
» Direct Downloads
» Disclaimer
» Draws in hyperspace
» Drupal
» Dynamic HTML

Related Websites

We found here 4 related websites.

  • BrightPlanet
    But BrightPlanet

  • BrightPlanet
    Photo of compass. 'BrightPlanet' header. Page Moved... The page you are looking for has moved. If you are not automatically re-directed, please click here. ...

  • The Deep Web
    The deep Web has gotten a lot of press in recent years. ... A company called BrightPlanet has coined the term "deep Web" to describe the phenomenon of ...

  • The Deep Web: Surfacing Hidden Value
    White paper on the Deep Web, an area of the Internet 550 times larger than the surface web crawled...

Page cached: Wednesday, July 5, 2006 23:55:48
Valid XHTML 1.0!  Valid CSS!

Navigation

Related articles


Page copy protected against web site content infringement by Copyscape