Macalua.com | E-commerce Strategy, SEO, PPC, E-Mail Marketing, Affiliate Marketing and Web Analytics
Home   About Macalua.com   Visit SEO Philippines   SEO Consulting  
Search:
Optimizing the World One Click at a Time
An Internet Marketing focused blog, with occasional musings about life on and off the battlefield by Marc Hil Macalua, SEO Philippines Founder and Philippine Marketing VP for US Auto Parts Network Inc.
 
Jul
18
Pinoy Top Blogs and 302 Redirects
Posted (Marc) in SEO on July-18-2005

Pinoy Top Blogs
Some search engine related conerns regarding Pinoy Top Blogs’ redirect method

Pinoy Top Blogs is a very nice project in that it adds an acceptable measure of objectivity as far as blog rankings go. Nothing could be sweeter than landing in the top 10. But to track hits to Pinoy Top Blog’s partner websites, Pinoy Top Blogs may be using, albeit unknowingly, the HTTP 302 exploit in Google. This documented exploit has been used by some webmasters for “page hijacking”.

Page hijacking as a possible SEO technique was published by Claus Schmidt in his Page Hijack: The 302 Exploit, Redirects and Google paper. Schmidt’s paper can get a little technical sometimes, but in essence, the exploit goes like this:

  1. Google follows the redirect to the original site but gives the redirecting site credit for the content.
  2. Google sees two sites with identical content and drops one of them from its index.
  3. Often the original site is the one dropped (read: banned from Google).
  4. Sometimes, malicious webmasters may redirect any visitor that clicks on the target page listing to any other page the hijacker chooses to redirect to.

As of May 8, 2005, Schmidt reports that the exploit is still not fixed in Google.

Disclaimer: I do not believe Pinoy Top Blogs has any intentions of abusing the 302 exploit. I do not know Yugatech personally, but I’ve seen his contributions to the Pinoy blogging community and I know I speak for a lot of Pinoy bloggers in saying how appreciative we are of his contributions. As Schmidt says in his paper:

This is a flaw on the technical side of the search engines. Some webmasters do of course exploit this flaw, but almost all cases I’ve seen are not a deliberate attempt at hijacking. The hijacker and the target are equally innocent as this is something that happens “internally” in the search engines, and in almost all cases the hijacker does not even know that (s)he is hijacking another page.

How the Exploit Is Done
Like Schmidt, this exploit is being published “to make the problem understandable and visible to as many people as possible in order to force action to be taken to prevent further abuse of this exploit.” Use of the exploit is NOT encouraged or endorsed.

Schmidt outlines the steps necessary for carrying out a 302 redirect hijack:

  1. Googlebot (the “web spider” that Google uses to harvest pages) visits a page with a redirect script. In this example it is a link that redirects to another page using a click tracker script, but it need not be so. That page is the “hijacking” page, or “offending” page.
  2. This click tracker script issues a server response code “302 Found” when the link is clicked. This response code is the important part; it does not need to be caused by a click tracker script. Most webmaster tools use this response code per default, as it is standard in both ASP and PHP.
  3. Googlebot indexes the content and makes a list of the links on the hijacker page (including one or more links that are really a redirect script)
  4. All the links on the hijacker page are sent to a database for storage until another Googlebot is ready to spider them. At this point the connection breaks between your site and the hijacker page, so you (as webmaster) can do nothing about the following:
  5. Some other Googlebot tries one of these links - this one happens to be the redirect script (Google has thousands of spiders, all are called “Googlebot”)
  6. It receives a “302 Found” status code and goes “yummy, here’s a nice new page for me”
  7. It then receives a “Location: www.your-domain.tld” header and hurries to your page to get the content.
  8. It heads straight to your page without telling your server on what page it found the link it used to get there (as, obviously, it doesn’t know - another Googlebot fetched it)
  9. It has the URL of the redirect script (which is the link it was given, not the page that link was on), so now it indexes your content as belonging to that URL.
  10. It deliberately chooses to keep the redirect URL, as the redirect script has just told it that the new location (That is: The target URL, or your web page) is just a temporary location for the content. That’s what 302 means: Temporary location for content.
  11. Bingo, a brand new page is created (never mind that it does not exist IRL, to Googlebot it does).
  12. Some other Googlebot finds your page at your right URL and indexes it.
  13. When both pages arrive at the reception of the “index” they are spotted by the “duplicate filter” as it is discovered that they are identical.
  14. The “duplicate filter” doesn’t know that one of these pages is not a page but just a link (to a script). It has two URLs and identical content, so this is a piece of cake: Let the best page win. The other disappears.
  15. Optional: For mischievous webmasters only: For any other visitor than “Googlebot”, make the redirect script point to any other page free of choice.

How Can I Stop my Pages from Being Hijacked?

Aside from politely emailing webmasters to remove the redirect to your website? Tony Spencer thinks there aren’t that many ways to stop page hijacking.

…get the other site to remove the HTTP 302 redirect. As I said before most webmasters have no idea of the havoc they are wreaking. I have found that a polite yet firm email nearly always results in a swift removal of the redirect and its often followed by a puzzled reply “Whats the problem?”.

Schmidt doesn’t think there’s a single fix strong enough to prevent your pages from being hijacked. He believes that the error “is generated by the search engines, is only found within the search engines, and hence it must be fixed by the search engines”. He does give some pointers on how to make hijacking harder, but then again these are just things you can do to “slow” down (not stop) hijackers:

  • Always redirect your “non-www” domain (example.com) to the www version (www.example.com) - or the other way round (I personally prefer non-www domains, but that’s just because it appeals to my personal sense of convenience). The direction is not important. It is important that you do it with a 301 redirect and not a 302, as the 302 is the one leading to duplicate pages.
  • Include a bit of always updated content on your pages (e.g. a timestamp, a random quote, a page counter, or whatever)
  • Use the meta tag on all your pages
  • Just like redirecting the non-www version of your domain to the www version, you can make all your pages “confirm their URL artificially” by inserting a 301 redirect from any URL to the exact same URL, and then serve a “200 OK” status code, as usual. This is not trivial, as it will easily throw your server into a loop.

For those who want to read more about 302 hijacking, there’s a rather long thread in WMW. Good read though.

Recommended Readings:
The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture
The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture

Yahoo! to the Max : An Extreme Searcher Guide
Yahoo! to the Max : An Extreme Searcher Guide

   Read More   

Comments:
yuga on July 18th, 2005 at 11:25 AM #

Hi Marc,

Thanks for the heads up. I have forwarded this to the script developers (evoTopSites) and ask them how this issue can be avoided.

Regards,
yuga

Marc on July 18th, 2005 at 12:33 PM #

Thanks Yuga. Appreciate it.

You could remove the redirect and instead place a “normal” HREF link to the website’s URL in tracker.php but that would disrupt the OUT aspect of your ranking formula I think…

Post a comment
Name: 
Email: 
URL: 
Comments: 
  • Categories

    • AdSense
    • Affiliate Marketing
    • Airsoft Sniper
    • Ask Marc
    • Buzz Marketing
    • Corporate Shit
    • Defining Moments
    • E-Business
    • Interviews
    • Keyword Research
    • Link Building
    • Personal
    • Photography
    • PPC
    • SEO
    • Three Links
    • Travel
  • Archives

    • May 2008
    • April 2008
    • March 2008
    • February 2008
    • January 2008
    • October 2007
    • September 2007
    • August 2007
    • July 2007
    • June 2007
    • May 2007
    • April 2007
    • March 2007
    • February 2007
    • January 2007
    • December 2006
    • November 2006
    • October 2006
    • September 2006
    • August 2006
    • July 2006
    • June 2006
    • May 2006
    • April 2006
    • March 2006
    • February 2006
    • January 2006
    • December 2005
    • November 2005
    • October 2005
    • September 2005
    • August 2005
    • July 2005
    • June 2005
    • May 2005
    • April 2005
    • March 2005
    • February 2005
    • January 2005
    • December 2004
    • November 2004
    • October 2004
    • September 2004
    • August 2004
    • July 2004
    • May 2004
    • March 2004
    • January 2004
    • December 2003
    • November 2003
    • October 2003
    • September 2003
  • Network

    • 100SexiestPinays.com
    • Airsoft Worthy
    • Ice Hockey Babe
    • Philippine Hosting
    • SEO Philippines
    • SEO Philippines Forums
  • Meta

    • Login
    • Valid XHTML
    • XFN
    • WordPress
  • Site Sponsors

      • Hosted Content
Copyright © Macalua.com. All rights reserved.
Supported By : internet directory and ecommerce directory
Professional website design by Askgraphics.com