How Do I Check My Canonical Tags?
- Crawl your website with Screaming Frog to get a data on all your website URLs. Let the crawler crawl all the pages you have open to search engines.
- Once done, export through the “directives” tab, and filter “canonical”. Click export.
- With Excel, you can add =IF(A4=B4,”Equal”,”Not Equal”) in a column to quickly identify which canonical tags are properly implemented.
Like Technical SEO? You May Also Like:
- How To See If Blocked Pages Are Indexed
- How To Flush Pages Out Of Google En Masse
- Find and Fix “Index Bloat” SEO Issues
The canonical tag can be a blessing or a curse. Living in the <head> section of your webpage, its purpose is to suggest to search engines the proper canonical page. The “canonical” page is the original page, or the page you want to represent your “main” document. Notice above I said “suggest to search engines.” That’s because Google and Bing won’t take this tag as a directive – instead, they’ll consider it a hint. Here’s more on the canonical tag definition: http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html.
So What’s The Big Deal?
The reason Google doesn’t accept the canonical tag as a directive is probably because they know many webmasters will screw it up (eg, syntax, implementation, etc.). If you have a massive database driven eCommerce site, and you’ve tried to get a developer team to implement canonical tags, you’ve seen how it can ultimately launch with a ton of unexpected results. Examples I’ve seen: via templates, products were suddenly “canonicalizing” to the homepage. Page 4 of a collection suddenly canonicalizing to page 1 of the collection. Crazy, random results are always likely if not implemented and QA’d properly. When the tag was announced in February of 2009, I worked for one of the largest eCommerce platforms at the time. We wanted to be first to offer this, and we rushed it out – with many, many problems. I’ve always had a love/hate relationship with this tag.
Also, in the beginning, it almost seemed as if Google took they’re time responding to the tag. Logs would show visits over and over again, but the tags would never take. Then on one random visit, boom! This feels like less of a case, but still prevalent. But the most frustrating thing – sometimes Google doesn’t honor the canonical tag at all, even when it makes all the sense in the world to do it. I’m currently working on a site with 1.5 million indexed URLs. A third is canonical URLs, so a lot of duplicate content. The canonical tag is on correctly, but Google just hasn’t bit yet. It’s been 6 months. In Google’s algorithmic infinite wisdom, it appears we have to do more to get this to influence them.
OK, so the canonical tag is far from perfect. It’s the imperfection you, as an SEO, want to plan for.
How To Audit Your Canonical Tags
You want to see if you canonical URLs were planned correctly and validate your canonical tag. You’re going to need two things – Screaming Frog (if you have a large site, the free version won’t cut it), and a spreadsheet.
Step 1: Crawl your entire site with Screaming Frog. Give yourself a liberal crawl. Under configuration > Spider, I typically respect noindex, but force SF through rel=”nofollows”. I also don’t respect canonicals because I want to capture all the duplicate content. Once done, export through the “directives” tab, and filter “canonical”. Click export.
You’ll get a shiny new Excel file, that after cleanup, will look something like this (click for larger image):
Step 2: Let’s compare column A to column B, and see where the mismatches are. The magic formula to paste into column C is:
Next, sort to view only “Not Equal”. You’ll get something like this (click for larger image):
Let’s examine the first result. The spreadsheet tells us this page: http://www.guitarcenter.com/JBL-EON500-Series-g5076t1.gc has a canonical tag for this page: http://www.guitarcenter.com/Search/Default.aspx?pcid=5076.
So in other words, the webpage is telling Google not to index this page:
But they should instead index this page:
If Google decides to follow this canonical tag, that would be bad. So I checked to see if http://www.guitarcenter.com/JBL-EON500-Series-g5076t1.gc was indexed. It’s not. In this case (unless there was a bigger picture I’m not privy to), this is a case of a flawed canonical tag implementation.
Canonical Tag vs 301 Redirect
A quick relevant comment – I always urge clients to do 301 redirects instead of hoping the canonical tag works. Google says the canonical is just like a 301 redirect. Fantastic… if it were only a directive. Canonical tags are great default solutions – or safety nets – for a website that is difficult to work with, but in my opinion, given the option a 301 redirect is always the preferred method.
This is a relatively easy process to spot outliers. If you have any questions, or want to know more about canonical tag best practices, let me know in the comments!