Technical SEO

Introducing The Google Indexation Tester

Published: 12.12.16

If you’re not interested in my preamble, just jump to the tool.

Search Console, love it or hate it, has its shortcomings. If you are a technical SEO focusing on index bloat, crawl waste, or anything related to indexation, you know that Search Console only gives you cursory information. The Index Status report is about as high level as a report could possibly be. Exporting provides no URL detail. If you want to know what URLs are actually indexed, you won’t be getting it from Google Search Console.

And that’s a longstanding problem technical SEOs have faced. We’ve tried to scrape Google’s search result pages (maybe with a Scrapebox or similar tool), but we’ve mostly been relegated to checking indexation manually using Google.com. We might lean on search operators until Google starts throwing Captchas, but even that isn’t always consistent.  See, Google doesn’t index everything it knows about. Google also doesn’t show everything it indexes. Sometimes a query that should logically pull a particular URL doesn’t, but if you query it a slightly different way… well, that might work for some reason – with (or without) using search operators. Cats and dogs, living together, mass hysteria.

Check out this example. We have an old employee who previously had a bio page on our website. (We miss you Megan, but you’re doing some really important stuff.) We redirected her page to our default “Meet The Team” page. A keyword search for Megan’s bio doesn’t pull it up as expected. But, we improperly (and accidentally) 302’d her page, which is keeping the URL in Google’s index.

URL indexation

Basically, Google is remembering the URL, but displaying the destination page. And Google says 302s are the same as 301s, huh? But that’s another story. So how did we discover that? Dumb luck and a lot of poking around in Google search results?  Nah, we used the tool I’m about to introduce.

Legend has it, Anthony Moore, while facing a problem, with face in palm, once said:

Anthony says

Well, Sean Malseed was listening.  He rolled it around his noodle, and went to work on the Greenlane Indexation Tester. When he rose from the cellar (or emerged from the cellar, triumphantly, as he puts it), a new tool was built.

Meet The Google Indexation Tester

 

Get The Indexation Tester Here

If you know Greenlane, we like to build tools when we come across client issues. Then, we like to share them for the world to use – check out the Garage for all our free tools.

Want more posts like this in your inbox?

Join our mailing list!

Totally not required, but we'd love to send you stuff from time to time. Get new tools, blog posts and more!

This newest tool is one of my personal favorites to date. Our Director of Technology, Sean Malseed (who works in our Garage and has his own tools suite at RankTank.org) found a hidden gem – an unpublished API that can be tapped to check indexation.

If you were at my State of Search session in November 2016, I gave you the link to version 1. Now we’re sharing the new and improved version for everyone. It’s in Google Sheets, so you’ll simply need to make a copy. No additional plugins needed. Simply paste in a list of URLs (as values), or upload your XML sitemap, and let it do its thing.

Note: This tool gets data without scraping Google search results, and without using your IP address!

Check out the screenshot or video tutorial below.  Any questions?  Email us at help@greenlaneseo.com.

indexation tester

 

Watch The Video Tutorial

Bill Sebald
Bill Sebald
Follow me on Twitter - @billsebald

I've been doing SEO since 1996. Blogger, speaker, and teacher at Philadelphia University. I started Greenlane in 2005 to help clients leverage search marketing to hit business goals. I love this stuff.

Read Bio
  • Fantastic tool, great for checking eCom sites for potential issues with deep indexation. I appreciate you putting this tool out there.

    • Hi Jeremy, I trust you’ve been well. Thanks so much for the comment. Love it.

  • Bruno

    Hi Bill. Thanks for that. It’s great. I wonder how it checks if an url is indexed or not. My reference article on this subject is : http://urlprofiler.com/blog/google-index-checker/. Is your method as reliable ?

    What’s more, you have to use proxy to check in bulk right ? How can your tool be free ? Proxies are not !

    I’m curious 🙂 (and really thankful)

    • Sean Malseed

      Hey Bruno! The tool uses a Google API to check. It doesn’t need proxies, because Google Sheets sends HTTP requests from Google internal IPs instead of the user’s IP.

      Thanks!

    • Sean Malseed

      Hey Bruno! The tool uses a Google API to check. It doesn’t need proxies, because Google Sheets sends HTTP requests through a Google internal IP.

    • Sean

      Hey Bruno! You don’t need to use a proxy. HTTP queries from Google Sheets actually use an internal Google IP address that changes constantly – like a built in proxy 🙂

    • Bruno, you reminded me when you mentioned URLprofiler.

      I adore that tool and the URLprofiler team. I actually forgot that tool can check for indexation too. So as far as I know, only URLprofiler and our tool can do this. For those reading this, please put URLprofiler on your radar if they aren’t already.

  • The tool delivers minimum 25% wrong indexing information: not indexed are shown as indexed and vice versa. Tested it with 100 urls, and checked them then again with imacro.

    • Sean Malseed

      Hey Evgeniy, thanks. Would you mind giving a few examples of each scenario? I’d love to squash any bugs 🙂 You can email me at sean@greenlaneseo.com

  • Hey Bill!
    Thank you, for that great tool. I love it.

    Unfortunately there are really some false positives. Do you have an idea how to make the tool more reliable?
    Is it possible to let the task run slower? Maybe that may help.

    thanks
    Jochen

  • Robert Kirk

    When I saw this tool mentioned in pointblankseo email newsletter I thought awesome just what I needed! My initial thoughts looking at tool, looked great. But on double checking some of the results, it unfortunately does not seem to be 100% accurate, which is shame as it would be great tool to have, as many others are not reliable.

    • Robert Kirk

      Hi Sean, thanks for taking time to come back to me. Just running some links through it now, seems to be working fine for me now. I will do more testing, but thanks very much for the tool!

      • Sean

        Thanks Robert! Let me know!

  • Ekaterina Petrakova

    hello!

    great idea & i tried the tool several times (including the last update on the 19th of December), but always get wrong information for several URLs. e.g. https://www.campsy.de/nl/camping-drenthe
    page is indexed, cache showed index as well, but in the tool – noindex result.

    ps: would be nice if you could add in results – URL not available or redirected. since sometimes sitemaps are not up-to-date -> we are getting wrong information that page is noindex, while is does not exist anymore

    thanks

    • Agree, so what we do is add this data to a Screaming Frog export as well.

      • Ekaterina Petrakova

        yes, but the first and the main question is how can you fix it in this tool, not pros of screaming frog 🙂
        the page is indexed but shown as not indexed: can you, please, investigate what could be the issue?

        • Sean

          Hi Ekaterina, would you mind emailing me a few examples? sean@greenlaneseo.com

          • Ekaterina Petrakova

            hi Sean. sent them via email

  • Hey Bill, great tool, I started to play with it and so far I find it pretty reliable 🙂

    Was just wondering if the results returned correspond simply to the “indexation status” (like, Google just scraped and indexed the page) or the actual index users are seeing, i.e. the URL is being displayed in SERPs.
    I’m asking because Search Console is telling me that almost 90% of the URLs on my website are being indexed, while your tool is giving me a completely different result – only 40 to 50% of the URLs are indexed. Is it normal to have such a great difference? Was it proven by your tests as well?

    I would like to back up my business case with some more data in order to have my client take action, that’s why I’m asking.

    • Since it’s pulling from a Google API, it’s pinging against “indexed” pages in Hummingbird. I don’t believe it’s hitting against knowledge graph data or other sources.

      Since Google has a tendency to not display everything they know about for a query, I’ve actually been finding more URLs being indexed here than what I can get google.com to show me. I also find Google Search Console to be wildly off in terms of indexed pages (via Sitemaps report). We just ran a sitemap check through the tool, and put up against what Google Search Console shows. In this case, the tool was more accurate.

      So with that said, you’re having an opposite result. I have not heard of that yet. Please email bill@greenlaneseo.com and we’ll see if we can troubleshoot. Thanks for the comment.

  • Luis Andrianto

    Thanks you for sharing the great tools

  • PublicaLog

    Hi, the tool is marking some URLs like no index, but the are in the index, some with cache and others without. in the website we receltly migrate to httpS, so its possible that in some datacenter this URLS are not index ? wich Datacenters this tool is checking ? Thanks!

  • What does it mean if it successfully tests 5 or so URLs and then stops with a “?” in the Indexed results column?

  • AK freelancer

    Great tool, Thanks for sharing this valuable post.
    http://www.a1webservice.com

  • Thanks – super helpful tool….
    Would it be possible to tweak it, so it can also test the indexation of URLs with Google News? (News quite often rejects articles as being “fragmented”, so it would be really helpful to see this for number of URLs… Thanks!

  • Christoph C. Cemper

    Hey Bill & team

    just saw this shared on my feed.

    Regarding this
    >>>Note: This tool gets data without scraping Google search results, and without using your IP address!
    we were curious, checked it out and found a
    very straightforward Google SERP scraping in he code,
    which of course DOES use our IP address and has all the usual Captcha/automated queries issues next to it.

    It’s all in function isIndexed(page,ignoreCase)

    Am I missing something, or where is that API you referred to?

    On another note – similar functions are not only in URLprofiler but e.g. also in SEOtools for Excel by Niels Bosma and his team.

    Please let me know
    Christoph

© 2017 Greenlane. All rights reserved.

2550 Eisenhower Avenue, A203, Eagleville, PA 19403

A Philadelphia SEO and Digital Marketing Agency    Privacy Policy    RSS

Subscribe to our Newsletter