A Helpful Guide to Web Search Engines -- Search Engines Ranking Methods and Algorithms

How To Use Web Search Engines
Tips on using internet search sites like Google, alltheweb, and Yahoo.

Search Engine Ranking Algorithms

by Curt A. Monash

Page Rank

Search engine ranking algorithms are closely guarded secrets, for at least two reasons: search engine companies want to protect their methods from their competitors, and they also want to make it difficult for web site owners to manipulate their rankings.

That said, a specific page's relevance ranking for a specific query currently depends on three factors:

Its relevance to the words and concepts in the query
Its overall link popularity
Whether or not it is being penalized for excessive search engine optimization (SEO).

Examples of SEO abuse would be a lot of sites linked to each other in a circular scam, or excessive and highly ungrammatical stuffing with keywords.

Factor #2 was innovated by Google with PageRank. Essentially, the more incoming links your page has, the better. But it is more complicated than that: indeed, PageRank is a tricky concept because it is circular, as follows: Every page on the Internet has a minimum PageRank score just for existing. 85% (at least, that's the best known estimate, based on an early paper) of this PageRank is passed along to the pages that page links to, divided more or less equally along its outgoing links. A page's PageRank is the sum of the minimum value plus all the PageRank passed to it via incoming links.

Although this is circular, mathematical algorithms exist for calculating it iteratively.

In one final complication, what I just said applies to "raw PageRank." Google actually reports PageRank scores of 0 to 10 that are believed to be based on the logarithm of raw PageRank (they're reported as whole numbers). And the base of that logarithm is believed to be approximately 6.

Anyhow, there are about 30 sites on the Web of PageRank10, including Yahoo, Google, Microsoft, Intel, and NASA. IBM, AOL, and CNN, by way of contrast, were only at PageRank 9 as of early in 2004.

Further refinements in link popularity rankings are under development. Notably, link popularity can be made specific to a subject or category; i.e., pages can have different PageRanks for health vs. sports vs. computers vs. whatever. Supposedly, AskJeeves/Teoma already works that way.

It is believed that Inktomi, Altavista, et al. use link popularity in their ranking algorithms, but to a much lesser extent than Google. Yahoo, owner of Inktomi, Altavista, Alltheweb, is rolling out a new search engine, which reportedly includes a feature called Web Rank. More on how that works soon.

Keyword Search

Most search engines handle words and simple phrases. In its simplest form, text search looks for pages with lots of occurrences of each of the words in a query, stopwords aside. The more common a word is on a page, compared with its frequency in the overall language, the more likely that page will appear among the search results. Hitting all the words in a query is a lot better than missing some.

Search engines also make some efforts to “understand” what is meant by the query words. For example, most search engines now offer optional spelling correction. And increasingly they search not just on the words and phrases actually entered, but the also use stemming to search for alternate forms of the words (e.g., speak, speaker, speaking, spoke). Teoma-based engines are also offering refinement by category, ala the now-defunct Northern Light. However, Excite-like concept search has otherwise not made a comeback yet, since the concept categories are too unstable.

When ranking results, search engines give special weight to keywords that appear:

High up on the page
In headings
In BOLDFACE (at least in Inktomi)
In the URL
In the title (important)
In the description
In the ALT tags for graphics.
In the generic keywords metatags (only for Inktomi, and only a little bit even for them)
In the link text for inbound links.

More weight is put on the factors that the site owner would find it awkward to fake, such as inbound link text, page title (which shows up on the SERP -- Search Engine Results Page), and description.

How sites get into search engines

The base case is that spiders crawl the entire Web, starting from known pages and following all links, and also crawling pages that are hand-submitted.   Google is pretty much like that still.  If a site has high PageRank, it is spidered more often and more deeply.

However, search engines are trying to encourage site owners to pay for the privelege of having their pages spidered.   Teoma's index is very hard to get into without paying money, and Inktomi's isn't that easy either.   And even if you do get into Inktomi for free, they'll take a long time to respider, while if you pay they respider constantly. One advantage of being respidered often is that you can tweak your page to come up higher in their relevancy rankings, then see if your changes worked.

Finally, you can also pay to appear on a search page.   That is, your link will appear when someone searches on a specific keyword or keyphrase. Google does a good job of making it pretty clear which results (at the top or on the right of the page) are paid; others maybe do a not-so-good job.

Paid search results are typically all pay-per-click, based on keyword. The advertiser pays the search engine vendor a specific amount of money each time an ad is clicked on, this fee having been determined by an auction of each keyword or keyphrase.

Spidap, Top Page

The Spider's Apprentice was conceived and written by Linda Barlow, who maintains this site for Monash Information Services. Copyright, 1996-2004. All rights reserved.
Updated: 05/11/04

Search Engine Ranking Algorithms

Search This Website