|
A search engine is a database of resources
extracted from the Internet through an automated
"crawling" process. This database is searchable
through user queries.
How does a search
engine work?
Words or phrases you enter in the search box
are matched to resources in the search engine's
database that contain your terms. These are then
automatically sorted by their probable relevance
and presented with the most "relevant" sites
appearing first.
The index of a crawler-based search engine is
built through the use of robots (spiders, web
crawlers) which operate on a fixed set of
instructions. The robot selects a page to visit
from a list of links (a "queue") gathered from
web pages that were previously searched. It
fetches the web page, collects certain
information (such as visible text, meta tags,
links, etc.) and sends it to an indexing
program. The information is then entered into a
database, ready for searching inquiries, then
the newly gathered links are entered into the
queue for a future visit and the process begins
again.
Link Analysis
Every major search engine uses link analysis
as a part of its ranking algorithm, according to
Danny Sullivan, editor of Search Engine Watch.
It differs from link popularity in that links
are given a "weight" (rank of importance)
determined by a preset calculation, whereas in
link popularity a web page's importance is
ranked according to how many hyperlinks are
pointing to that page, regardless of where they
came from.
According to Google: "In essence, Google
interprets a link from page A to page B as a
vote, by page A, for page B, but Google looks at
more than the sheer volume of votes, or links a
page receives; it also analyzes the page that
casts the vote. Votes cast by pages that are
themselves 'important' weigh more heavily and
help to make other pages 'important.' Important,
high-quality sites receive a higher PageRank,
which Google remembers each time it conducts a
search.... Google combines PageRank with
sophisticated text-matching techniques to find
pages that are both important and relevant to
your search. Google goes far beyond the number
of times a term appears on a page and examines
all aspects of the page's content (and the
content of the pages linking to it) to decide if
it's a good match for your query".
Teoma, owned by Ask Jeeves, makes use of what
it calls "Subject-Specific Popularity." This
technology, according to Teoma, "ranks a site
based on the number of same-subject pages that
reference it, not just general popularity."
Teoma's process allows for a fine-tuned search
using the authority of the link as a part of its
relevance. Web sites are grouped into
"communities" that have the same topic. Searches
are then further refined within the communities,
using Subject-Specific Popularity.
Outgoing links are not used in the algorithm for
good reason. Think about it for a moment. The
web developer creates the outgoing links. If
those links were used in the algorithm, he would
only need to link to the most popular sites on
the web to increase his site's search engine
listing position.
|