You've probably heard of Alta
Vista and HotBot, -- both popular search
indexes. Indexes regularly scan the Internet for Web pages and record the HTML
content and key words. They also have the ability to follow any links associated
with scanned pages and get even more information.
The job of compiling data for indexes is done by spiders
(also called robots, bots, or crawlers ergo the names
HotBot and WebCrawler),
software programmed by a human to automatically gather information from all over
the 'Net based on specific or broad search criteria. Most of the time spiders
scan pages on the fly, without the owner's knowledge or consent (if you don't
want some or all of your web pages scanned by spiders, you can write some HTML
into your page to keep them out).
The advantages of this kind of service is their data bases are
very large and updated often by spiders working around the clock. They catalog
Web pages in a computational manner without human intervention. A search
engine's spider catalogs all the pages of a given web site, listing for you only
the pages that match the words or phrases you're searching for.
For instance, if you're looking for information about spiders,
you'll get over thirty-nine thousand hits (links to a Web page) from Alta Vista
with the word spiders in them. This means not only will you get pages
referencing Internet robots, you'll mostly get the eight-legged,
living-in-your-shoe-and-going-to-bite-you kind of spider.
A drawback to using services of this type is that sifting
through so many hits to find what you're looking for is sometimes a daunting
task. Some indexes include a number of options you can utilize to help narrow
down your search criteria, such as search for this exact phrase or search
for any of the words on HotBot.
Back
to Search engine fundamentals!
On the other hand, Yahoo! and
Magellan are hierarchical directories of
web page subjects. Each reference is entered and updated by a person manually,
placing each web address in a certain context much like your telephone company's
Yellow Page directory.
People catalog the sites in a directory, so the hits often
include reviews and/or recommendations, which can guide you through the content
of the pages quicker and more easily.
To have a Web site listed in a directory you must submit it
yourself, or you can hire a company to do it for you. The directory has the last
word on where they catalog your site. This means directories contain far fewer
sites than indexes do, but they are better targeted to what word(s) you use to
search.
For example, you enter the same key word spiders
in Yahoo!, and this time you'll get a list of categories like Science:
Zoology: Animals, Insects, and Pets: Arachnids or Computers and
Internet: Internet: World Wide Web: Searching the Web: Robots, Spiders, etc.
Documentation which can narrow and shorten your search significantly.
You'll get fewer hits overall, and hits on pages with headings and content
within the context of the keywords you enter.
One drawback is that Yahoo's hits are usually to home pages (the
first page of a site) only, for instance it would hit a home page called Nancy's
Page-O-Spiders but not Nancy's Home which contains a page
exclusively on spiders. Another drawback to directories is that manually
updating directories is tedious and time consuming, and that means old sites
that are no longer valid (dead links) are often listed long after their demise.
Back
to Search engine fundamentals!
Some search services use both schemes -- they are both an index
and a directory, like Infoseek and Excite.
These services occasionally send out a spider to collect and cull Web sites,
alongside people cataloging sites that are submitted by Web developers.
Yahoo's directory is one of the the best on the Web, but their
service is limited. To fill the gaps in their service, Yahoo! teamed up with
Alta Vista to automatically send your query there if your Yahoo! search found no
matching hits.
Back
to Search engine fundamentals!
As a rule of thumb, if I'm not exactly sure what I'm looking
for, like modems, I'll start with a directory, which
will show me lots of modem brand hits and companies that sell modems. But, if I
know I'm looking for information about a specific brand of modem, I'll use an
index, which will show me many sites with that particular brand name listed
somewhere on the page.
No one service catalogs the whole web. Each service logs parts
of it and there is overlap. Services also put their own spin on how they rank
hits. For instance, some advertisers pay for their sites to be listed on some
services, so their sites get priority listing, being listed in your search even
if their site has nothing to do with what you're looking for. Knowing this, it's
a good idea to use more than one search service when you're looking for
something.
There are hundreds of search tools out there, so don't only use
the big, popular ones. There are even specific search services for special
interests, such as art or science. Some search email, home addresses or phone
numbers, some usenet only, some search both and more. Look for all-in-one search
engines like Dogpile or Metafind
that enter your key words into many engines at once, which result in the first
set of ten or so hits of several services listed on one seemingly unending page.
Try using a variety of search services using your favorite hobby
as the key word(s) and you'll see the radically different hits you'll get with
each directory and index. No one service is perfect, so use as many as you have
time for. Using many search engines will also help you get a feel for how the
different kinds of services work. You'll soon find yourself using a favorite
engine to find all the information you need quickly and painlessly.
Back
to Search engine fundamentals!
The Internet is in a state of constant change. Internet
addresses disappear as fast or faster than new ones are created. Many sites
relocate without telling anyone, and dead listings are everywhere, so finding a
page that's moved requires that you utilize more than one engine.
Back
to Search engine fundamentals!
Happy searching!