Seoza Logo

 

Google PageRank, Keywords and Anchor Link text

When a visitor arrives at a web site, hopefully yours, the challenge is to keep the visitor there by providing valuable or useful information. This is very difficult. It can be accomplished by web page layout and by providing an easy and meaningful way to get to the next part of the web sites story through a well located, descriptive text link.

You may have noticed that more and more web sites are abandoning button type links for the simple reason they cannot accommodate much text and also they add to download time (the buttons are pictures).

The worst kind of buttons are those that say Next, Back, Home etc since they do not encourage any visitor to explore. If instead of Next the link said click here for a free offer chances are that the link would be clicked upon and that little bit of extra time would be available to convince your visitor of the value of your site.

If instead of Home it said start page; subscribe to our free gardening newsletter then also the visitor would at least know what could be accomplished at the home page.

Too many web sites are designed not to be friendly and useful. The designer never considered the site design from the visitors perspective.

The vital importance of anchor link text to Google success

This is a very appropriate time to introduce Segey Brins and Lawrence Pages pioneering search engine work to you. If you believe you really want to make a difference to search engine success the following paper presented by these two young men at the 7th world wide web conference in 1998 is a must read.

This paper is one of very few published that deals specifically with a search engines inner workings. To my knowledge it is the only meaningful work publicly available that gives credence to what the good SEOs believe and teach.

At the time of the publication being presented Sergey Brin and Lawrence Page were PhD students at Stanford University. They had built a prototype search engine based upon some 24 million web pages which they called Google (after a mathematical entity called a googol which represented the number 1 followed by 100 zeros.).

Brin and Page were acutely aware that the major search engines at that time did not always return quality results and that the commercial world manipulated the SERPs to suit advertisers for example..

High SERPs gathered large amounts of adspend.

The two young men based their new search engine upon what they called citations, which had been the essence of academic success through the years. A citation was essentially a reference to a published academic paper. The more citations that an academic collected the more valuable was that work judged.

The equivalent on the web was the collection of backward links argued Brin and Page. As such a very important part of the algorithm developed was based upon backward links and the text associated with each backward link.

The immediate problem they faced was that for every web page there was about 7 backward links thereby creating the need for more and more computer resources. Within the Google engine they developed means for parsing and storing all the words from every web page and for storing the complete HTML of every page. They were able to separate words into these groups:

  • Words in the URL

  • Words in the anchor link text on page and also from other web pages

  • Words in the Title

  • Words in bold

  • Words capitalized

  • Words in the main body text

  • Amongst others

In addition they were able to record how close each word was to every other word (they called this proximity) and also to relate each link to the source web page and the destination web page this was the information they used to calculate PageRank which we will leave to later.

They created word lists which they called Hits. They divided these Hits into fancy Hits and plain Hits.

Fancy hits were words contained in TITLES and LINKS. These Hits were kept in what were called short barrels. All other Hits were kept in long barrels. These are the actual terms used by Brin and Page in their groundbreaking paper referred to.

What all this meant in practice was that in response to a search query the short barrels were searched first. If sufficient results were returned to match the query limit set by Google the long barrels were not even looked at since it was deemed that the more relevant web pages had already been found (by searching for TITLE and LINK words only). Now here lies the reason to making sure your keyword is in TITLE and LINKs.

The following description is a very short paraphrase of the full paper of course. Sergey Brin and Lawrence Page published the following table of how they carried out searches based upon a searcher typing in a query. In response to the query search Google would:

  1. Parse the query.

  2. Convert words into wordIDs.

  3. Seek to the start of the doclist in the short barrel for every word.

  4. Scan through the doclists until there is a document that matches all the search terms.

  5. Compute the rank of that document for the query.

  6. If we are in the short barrels and at the end of any doclist, seek to the start of the doclist in the full barrel for every word and go to step 4.

  7. If we are not at the end of any doclist go to step 4.

Sort the documents that have matched by rank and return the top k.

Source is above paper reference Figure 4. Google Query Evaluation

Note from the table that when all the query searching has been done only the top k (top 1,000) SERPs are returned. Note also that each return is ranked (item 5 above) for that query.

The paper discusses how the ranking process is carried out in broad terms. The following is my interpretation of this broad description please bear this in mind and that I could be wrong.

There is no doubt in my mind that Google takes the following parameters into account for their ranking algorithm and there is no doubt a limit is placed upon on-page factors like these (the paper clearly states this to be the case just as clearly as it stated that the terms below were all considered)

  1. Every Hit list contained the following information:

  2. Every word

  3. Plain text large font

  4. Plain text small font

  5. Plain text bold font

  6. Capitalisation information

  7. Word proximity or position in document

  8. It differentiated between same word in Title, anchor text or URL

  9. And others

Google allocated a weight to reach type of Hit and counted the frequency of each Hit of each type. My table below is a bit of pure conjecture but serves to highlight what I believe I understand from reading the research paper. Take note that weights are numbers I have allocated purely for illustration purposes.

Assume the search query word is garden.

Google counts every Hit of the query word garden in the following (and more of course) parameters

Garden found in Frequency of garden in Hit type Weight per single garden Hit Googles Maximum possible dotprod Actual dotprod

Title 1 100 100 100

URL 1 12 12 12

Link on page 3 9 54 27

Link to page 9 12 168 84

Bold large text 2 3 12 6

Normal large text 3 2 12 6

Normal small text 20 1 30 20

TOTAL 388 255 ... see table in book

Dotprod is the product of weight x frequency and is used only because the Brin, Page paper refers to the term as dotprod

If my assumptions are right and using these purely fictitious numbers once a total of 388 is reached it would not matter how many more instances of the Hit were recorded. It would not influence the on-page score because the MAXIMUM set by Googles algorithm had been reached at 388. This is why the point was made earlier that only so much optimisation can be achieved by reference to on-page factors alone.

Without going into detail about PageRank but for the sake of completion the above total of 255 scoring points is modified by multiplying the 255 total by the actual PageRank (scoring points are my terms) to get the final rank mark as used by Google in arranging its SERPs.

For the sake of clarity lets assume that this web page had a PageRank of 1,000 then the total score for ranking purposes would be 255,000.

Lets assume that a competitive web page scored 139 points for similar on-page factors but had a PageRank of 2,000 then this latter web page would be ranked higher than the first one since it would have a total rank of 2,000 x 139 = 278,000.

If PageRank of another competing web page was 10,000 then the single presence of the word garden in TITLE and nowhere else would mean the total score would be 1,000,000 (100 x 10,000) and would rank above both the other examples.

If the PageRank of a competing page was 0 (Zero) then no matter if on-page score was 388 the total score would be Zero this is how Google can punish spammers.

Similarly if PageRank was 20,000,000 but score for on-page factors was Zero then total would be Zero ie the web page would be found right at end of SERPs.

General principles of using anchor text in links:

  • Most surfers look to top of page and left hand side of page for most important links.

  • No harm is done using bold text for links. It is possible it would help if related to the keyword. Remember Google knows if you have used a bold keyword. To you and me bold signifies more important.

  • Most people expect links to be blue underlined but they can be other colours of course.

  • If you have a web page for which the keyword is garden and you have put effort into optimizing the on-page factors such as including garden in the Title and its being well distributed in the body text and on page links then it is very important that on other web pages on your site that links include the word garden. If your keyword was water gardening then the whole phrase should be included in the link text pointing back to this page.

  • This is how Google knows that the web page referred to is relevant to water gardening. It may not be high quality ( as judged by PageRank) but is relevant to the query.

  • I belong to the school that believes rightly or wrongly that the keyword in the URL is a good thing. This means I consider http://www.pond-pumps.com is a better name for an URL than http://www.pondpumps.com . Google does not take hyphens into account. It reads pond-pumps as pond pumps.

  • In encouraging others to link back to your web page always remember to attempt to get them to use your own words which would of course would include the keyword. Most people would go along with your request.

  • Given recent information published by Google it is probable that the following approach will become important in creating a backward link of added value see below and note the link has appropriate normal text containing the keyword in close proximity to the actual text which also included the keyword. This is typical of a normal citation and is another way of rewarding well-designed web pages and an approach that makes spamming even more difficult. This is the type of link information also provided by most good directories and many directories are manually checked and arranged. We are talking of Google Quotes in lab testing phase.

 

We are ethical South African search engine optimization practitioners based in Johannesburg, Gauteng. Internet marketing efforts in South Africa are still in their infancy.  Search engine marketing and search engine positioning is built around 3 basic concepts - organic search engine optimization, optimizing for PPC (pay per click) campaigns and site functionality. All three areas must work together to succeed.

Sitemap | copyright © Tony Roocroft  | Tel: +27 11 454 0105