The invisible web

There is a growing body of web based material that, for a variety of technical and logistical reasons, traditional search engines do not or cannot index. Recent estimates suggest that this so-called "inaccessible" part of the web is about 500 times larger than the portion of the internet which traditional search engines already facilitate access to.

However, you shouldn't assume that such non-indexed information is not useful to researchers. The types of material (not a comprehensive listing) that could fall under the generic heading of the "Invisible Web" include:

  • Some brand new websites - it can take time for traditional search engines to find them
  • Contents of searchable databases
  • Disappearing websites - accessing material where the URL has been changed
  • Material deliberately excluded from traditional retrieval tools by web administrators
  • Sites hosted on an intranet or behind a "firewall" of some kind
  • Contents of large websites - many search engines impose a limit on the number of pages indexed from any one particular site
  • Professional online resources (fee-based or licensed) e.g., legal or company information
  • Web sites requiring user registration or login (free or fee-based)
There are an increasing number of resources designed to address this topic, offering valuable insights into how to undertake more focussed and effective searches.
  • Search the Internet - Invisible Web
    A listing of useful retrieval tools maintained by the Library service at Queensland University of Technology.
  • Direct Search
    Created and maintained by Gary Price, of George Washington University, this incredibly detailed site is " ..... a growing compilation of links to the search interfaces of resources that contain data not easily or entirely searchable/accessible from general search tools like Alta Vista, Google, and Hotbot.'
  • Invisible Web (Deep Web)
    Which includes details of web gateways and directories as well as search engines.