| It¹s EverywhereŠ. It¹s EverywhereŠ. |
| Don E. Descy | |
| Minnesota State University | |
| Today |
| Searching, Searching, Searching | |
| The Searchable Web | |
| The Invisible Web. (deep Web) | |
| What is it? | |
| Why is it? | |
| How do we get around it? | |
| Resources/References |
| Why Is This Important?? |
| You are going to want
information.. *Reports, papers, presentations *Medical, family, jobs, personal. |
|
| Your students are going to want
information.. *Reports, papers, presentations, personal. |
|
| Most of what you want can¹t be found using regular search techniques ! |
| The Question: |
| How do you find information | |
| that is availableŠ. but isn¹t ?? |
|
| How do you find your exit on the ³Information Superhighway² if mapping that exit can¹t be done? |
| The Invisible Web |
| Web sites that are hidden or are unable to be found or cataloged by regular search engines. |
| "³Public information on the..." |
| ³Public information on the deep Web is
currently 400 to 550 times larger than the commonly defined World Wide Web.² (BrightPlanet, 2004) |
|
| "³A" |
| ³A full ninety-five per cent of the
deep Web is publicly accessible information ‹ not subject to fees or
subscriptions..² (BrightPlanet, 2004) |
|
| The Invisible Web Facts |
| 200,000+ Web sites. | |
| 550 billion individual documents compared to the three billion of the surface Web. | |
| Contains 7,500 terabytes of information compared to nineteen terabytes in the surface Web. | |
| Total quality content is 1,000 to 2,000 times greater than the surface Web. | |
| The Invisible Web Facts |
| Sixty of the largest sites collectively contain over 750 terabytes (84B pages) of information ‹ They exceed the size of the surface Web forty times. | |
| Fastest growing category of new information on the Internet. | |
| Fifty per cent greater monthly traffic than surface sites. |
| Invisible Web Facts |
| Narrower, with deeper content, than conventional surface sites. | |
| More than half of the content resides in topic-specific databases. | |
| Content is highly relevant to every information need, market, and domain. |
| Invisible Web Facts |
| Not well known to the Internet-searching public | |
| Searching, Searching, Searching |
| Usually carried out using a ³directory² or ³search engine². | |
| Fast and efficient. | |
| Misses most of what is out there. | |
| 70% of searchers start from 3 sites (Nielson, 7/2004): | |
| Google,Yahoo, MSN. |
| Searching Tools |
| Directories. | |
| Search engines. |
| Directories |
| Hand selected, evaluated, annotated. | |
| Broad topics work best. | |
| Quality over quantity. | |
| Location on list: May be paid. |
| How Directories Work |
| Directory Problems |
| Done by humans. | |
| Takes time. | |
| No universal categories or cataloging system. | |
| Misses the most information/sites. |
| General Subject Directories |
| ³Yahoo². | |
| Biggest and most famous. | |
| Often useful. | |
| Information.. jobs.. travel..
shopping.. toŠ.. |
|
| Yahoo.com |
| Slide 18 |
| Search Engines |
| Computer generated. | |
| Must be static and linked. | |
| Narrower topics. Quantity over quality. | |
| Uses newer retrieval technologies. | |
| Location on list: May be paid. | |
| Google, Hotbot, Northern Light, AltaVista, etc. |
| How Search Engines Work |
| Search Engine Problems |
| Spiders/robots don¹t think. | |
| More likely to index sites with more links to them (popularity). | |
| More likely to index US sites. | |
| More likely to index commercial sites. | |
| Sites pay for indexing/position. |
| "At one time" |
| At one time showed actual bid! |
| Slide 23 |
| Finding Good Search Engines |
| UC-Berkeley: Recommended Search
Engines: http://www.lib.berkeley.edu/TeachingLib/ Guides/Internet/SearchEngines.html |
|
| UC-Berkeley: The Best Search Engines (9/2004): | |
| #1 Google #2 Teome | |
| #3 Yahoo! Search | |
| Who are you really searching? |
| Who are your really searching? |
| What Do We Miss? |
| Library of Congress: 30 million+ documents. | |
| ERIC databases. | |
| Most daily newspapers. | |
| Health and medical databases. | |
| Museum and library collections. | |
| The information you need???? | |
| Why are pages invisible? (1) |
| 1. Searchable databases: | |
| Typing required. | |
| Selection of option
combination required. |
|
| **Pages not available until asked for (ex: Library of Congress). | |
| **Pages are not static but dynamic (may not exist until requested). | |
| Why are pages invisible? (1) |
| Search engines can¹t handle ³dynamic pages². | |
| Search engines can¹t handle ³input boxes². |
| Slide 30 |
| Slide 31 |
| Slide 32 |
| Slide 33 |
| Why are pages invisible? (2) |
| 2. Password or Login required: (Spiders do not know passwords or login IDs). |
||
| 3. Non-HTML pages: | ||
| PDF, Word, Shockwave, Flash... | ||
| Some search engines may find them:
ex: Google, AltaVista |
||
| Why are pages invisible? (3) |
| 4. Script-based (computer generated) pages: | ||
| Create all or part of Web page. | ||
| Contain ³?² in URL. | ||
| Spiders programmed to back off. | ||
| http://calver.org/search/file/ship (yes!) | ||
| http://calver.org/search?title=plane (no) | ||
| Sites to Check |
| Finding Invisible Information |
| ³Librarians¹ Index². | |
| Compiled by librarians in the ³information supply business². | |
| Highest quality sites only. | |
| Reliable, annotated. | |
| www.lii.org |
| Finding Invisible Information |
| ³About². | |
| 2,400,000 + resources. | |
| Wide variety of subjects: Teens, religion, spirituality, shopping, (expected) | |
| About.com |
| Finding Invisible Information |
| ³direct search². | |
| ³Data not easily or entirely searchable/accessible from general search tools.² | |
| www.freepint.com/gary/direct.htm |
| Slide 40 |
| Finding Invisible Information |
| ³The Invisible Web Catalog². | |
| 10,000 + searchable databases. | |
| Quick search, ³Hot List² | |
| Sort alphabetically or by score (relevance). | |
| www.profusion.com | |
| Slide 42 |
| Finding Invisible Information |
| www. invisible-web.net |
| Finding Invisible Information |
| ³IncyWincy². | |
| Over 100,000 databases. | |
| Many links to other search engines. | |
| www.incywincy.com |
| Finding Invisible Information |
| ³CompletePlanet²: | |
| 103,000 + databases and specialty search engines. | |
| Some Œsurface¹ searching. | |
| www.completeplanet.com |
| Finding Invisible Information |
| Some are research oriented. | |
| ³Infomine². | |
| Infomine.ucr.edu/ | |
| ³Academic Info². | |
| www.academicinfo.net |
| Slide 47 |
| Slide 48 |
| SoŠ What To Do... |
| Search several sites. | |
| Used the ³Advanced Search² feature. |
|
| Search using the term ³Invisible Web² for IW search sites. |
|
| Search several ³Invisible Web² sites. | |
| Questions ? |
| PowerPoint available at descy.net |
| Slide 51 |