Founded in 2005, we're a strong boutique SEO consulting group with big experience and
industry recognition. We invite you to browse the site and learn more about who we are,
and more importantly, what we can bring to your business. Partner with us.
 
 
 

Wildcards in Robots.txt

Articles from

Follow me on or Twitter (@billsebald)

Like this post?  Why not tweet it! 


Like us on Facebook for one daily SEO industry article in your newsfeed.

One of the greatest (mostly) unknown abilities of robots.txt is wildcard pattern matching. We know how robots.txt can block files and directories from being crawled, but in the case of URLs with unique paramaters and duplicate content issues, did you know that Google and Yahoo respect wildcards (this was verified by connections at the engines – but MSN said they do not respect pattern matching “at this time”).

If you have URLs with unique parameters – for example, UTM with Google analytics, paid search tags, and so on – you can create a robots.txt entry like this:

User-agent: *
Disallow: /*utm

How cool is that? Remember, this only should be employed if you have very unique parameters. If your parameters are keyworded, and that keyword appears as other directories or page names, they will get blocked too… quite possibly to your dismay.

More from Google’s Webmaster Blog.



    Comments

    The comments are do-follow. However, any comments that use keyword anchor text as the name will be removed.

    1. Bill
      April 22, 2008

      My contact is not Matt Cutts. But I did discover this on the Google Blog to help add validity.

      Reply


    2. seojedi
      April 22, 2008

      Bill, you just saved me tons of hours coding conditional redirects!

      Reply


    3. Premal
      April 22, 2008

      What is matt cutts’ email address? I assume he’s your contact, eh?

      Reply