It¹s EverywhereŠ. It¹s EverywhereŠ. |
Don E. Descy | |
Minnesota State University | |
Today |
Searching, Searching, Searching | |
The Searchable Web | |
The Invisible Web. (deep Web) | |
What is it? | |
Why is it? | |
How do we get around it? | |
Resources/References |
Why Is This Important?? |
You are going to want
information.. *Reports, papers, presentations *Medical, family, jobs, personal. |
|
Your students are going to want
information.. *Reports, papers, presentations, personal. |
|
Most of what you want can¹t be found using regular search techniques ! |
The Question: |
How do you find information | |
that is availableŠ. but isn¹t ?? |
|
How do you find your exit on the ³Information Superhighway² if mapping that exit can¹t be done? |
The Invisible Web |
Web sites that are hidden or are unable to be found or cataloged by regular search engines. |
"³Public information on the..." |
³Public information on the deep Web is
currently 400 to 550 times larger than the commonly defined World Wide Web.² (BrightPlanet, 2004) |
|
"³A" |
³A full ninety-five per cent of the
deep Web is publicly accessible information ‹ not subject to fees or
subscriptions..² (BrightPlanet, 2004) |
|
The Invisible Web Facts |
200,000+ Web sites. | |
550 billion individual documents compared to the three billion of the surface Web. | |
Contains 7,500 terabytes of information compared to nineteen terabytes in the surface Web. | |
Total quality content is 1,000 to 2,000 times greater than the surface Web. | |
The Invisible Web Facts |
Sixty of the largest sites collectively contain over 750 terabytes (84B pages) of information ‹ They exceed the size of the surface Web forty times. | |
Fastest growing category of new information on the Internet. | |
Fifty per cent greater monthly traffic than surface sites. |
Invisible Web Facts |
Narrower, with deeper content, than conventional surface sites. | |
More than half of the content resides in topic-specific databases. | |
Content is highly relevant to every information need, market, and domain. |
Invisible Web Facts |
Not well known to the Internet-searching public | |
Searching, Searching, Searching |
Usually carried out using a ³directory² or ³search engine². | |
Fast and efficient. | |
Misses most of what is out there. | |
70% of searchers start from 3 sites (Nielson, 7/2004): | |
Google,Yahoo, MSN. |
Searching Tools |
Directories. | |
Search engines. |
Directories |
Hand selected, evaluated, annotated. | |
Broad topics work best. | |
Quality over quantity. | |
Location on list: May be paid. |
How Directories Work |
Directory Problems |
Done by humans. | |
Takes time. | |
No universal categories or cataloging system. | |
Misses the most information/sites. |
General Subject Directories |
³Yahoo². | |
Biggest and most famous. | |
Often useful. | |
Information.. jobs.. travel..
shopping.. toŠ.. |
|
Yahoo.com |
Slide 18 |
Search Engines |
Computer generated. | |
Must be static and linked. | |
Narrower topics. Quantity over quality. | |
Uses newer retrieval technologies. | |
Location on list: May be paid. | |
Google, Hotbot, Northern Light, AltaVista, etc. |
How Search Engines Work |
Search Engine Problems |
Spiders/robots don¹t think. | |
More likely to index sites with more links to them (popularity). | |
More likely to index US sites. | |
More likely to index commercial sites. | |
Sites pay for indexing/position. |
"At one time" |
At one time showed actual bid! |
Slide 23 |
Finding Good Search Engines |
UC-Berkeley: Recommended Search
Engines: http://www.lib.berkeley.edu/TeachingLib/ Guides/Internet/SearchEngines.html |
|
UC-Berkeley: The Best Search Engines (9/2004): | |
#1 Google #2 Teome | |
#3 Yahoo! Search | |
Who are you really searching? |
Who are your really searching? |
What Do We Miss? |
Library of Congress: 30 million+ documents. | |
ERIC databases. | |
Most daily newspapers. | |
Health and medical databases. | |
Museum and library collections. | |
The information you need???? | |
Why are pages invisible? (1) |
1. Searchable databases: | |
Typing required. | |
Selection of option
combination required. |
|
**Pages not available until asked for (ex: Library of Congress). | |
**Pages are not static but dynamic (may not exist until requested). | |
Why are pages invisible? (1) |
Search engines can¹t handle ³dynamic pages². | |
Search engines can¹t handle ³input boxes². |
Slide 30 |
Slide 31 |
Slide 32 |
Slide 33 |
Why are pages invisible? (2) |
2. Password or Login required: (Spiders do not know passwords or login IDs). |
||
3. Non-HTML pages: | ||
PDF, Word, Shockwave, Flash... | ||
Some search engines may find them:
ex: Google, AltaVista |
Why are pages invisible? (3) |
4. Script-based (computer generated) pages: | ||
Create all or part of Web page. | ||
Contain ³?² in URL. | ||
Spiders programmed to back off. | ||
http://calver.org/search/file/ship (yes!) | ||
http://calver.org/search?title=plane (no) |
Sites to Check |
Finding Invisible Information |
³Librarians¹ Index². | |
Compiled by librarians in the ³information supply business². | |
Highest quality sites only. | |
Reliable, annotated. | |
www.lii.org |
Finding Invisible Information |
³About². | |
2,400,000 + resources. | |
Wide variety of subjects: Teens, religion, spirituality, shopping, (expected) | |
About.com |
Finding Invisible Information |
³direct search². | |
³Data not easily or entirely searchable/accessible from general search tools.² | |
www.freepint.com/gary/direct.htm |
Slide 40 |
Finding Invisible Information |
³The Invisible Web Catalog². | |
10,000 + searchable databases. | |
Quick search, ³Hot List² | |
Sort alphabetically or by score (relevance). | |
www.profusion.com | |
Slide 42 |
Finding Invisible Information |
www. invisible-web.net |
Finding Invisible Information |
³IncyWincy². | |
Over 100,000 databases. | |
Many links to other search engines. | |
www.incywincy.com |
Finding Invisible Information |
³CompletePlanet²: | |
103,000 + databases and specialty search engines. | |
Some Œsurface¹ searching. | |
www.completeplanet.com |
Finding Invisible Information |
Some are research oriented. | |
³Infomine². | |
Infomine.ucr.edu/ | |
³Academic Info². | |
www.academicinfo.net |
Slide 47 |
Slide 48 |
SoŠ What To Do... |
Search several sites. | |
Used the ³Advanced Search² feature. |
|
Search using the term ³Invisible Web² for IW search sites. |
|
Search several ³Invisible Web² sites. | |
Questions ? |
PowerPoint available at descy.net |
Slide 51 |