Have you ever tasked your team to audit a website, only to have them run it through a crawler and find that the site consists of a single page? That was exactly what we ran into recently with one of our client websites. The client’s site was robust to say the least (to the end-user through a browser), so when good ol’ Screaming Frog returned just 5 URLs we knew that something was up. It turned out that our client’s site was built in Ajax. Now, if (more like when) this happens to you in the future and your tech team says a site is uncrawlable, you can direct them to this post! You’ll never have to consider an Ajax website uncrawlable again.
Why Does This Matter?
More On Google’s Recently Deprecated Recommendation…
It’s all well and good that Google can read and crawl escaped fragments, but many popular crawlers including Screaming Frog and SEMRush still cannot. Screaming Frog did add a feature to crawl Ajax, but it only worked if a site was configured to follow Google’s (now deprecated) recommendations. So if your team is already using something like Prerender.io for your site or your clients’ sites, there shouldn’t be any problem using Screaming Frog to crawl an Ajax application. However, the way these tools work is by creating cached HTML versions of Ajax applications and serving them to Googlebot when it triggers an escaped fragment.
Using something like Prerender isn’t terrible, it just serves Google cached pages, which can result in outdated versions of rendered pages being served if the cache freshness isn’t set or kept current. Not to mention it costs money, has to be installed, and then implemented. That’s a lot of work just to make something crawlable, especially if it involves training. So when we discovered our client’s site was built entirely in Ajax, we knew that we had to come up with a solution to crawl and audit it. Here are the steps we took.
Use Google Analytics, XML Sitemaps & Search Console to Audit Ajax Sites
- Since Screaming Frog can’t crawl the site you’ll need to pull URLs from the XML Sitemap(s). The XML Sitemap(s) are found in a site’s robots.txt.
- Copy and paste the XML Sitemap from the robots.txt into your browser.
- Right-click and Save As, or choose File Save As, and save the file as .HTML. You can just change the extension from XML to HTML and click save.
- Then, open the saved HTML file in Excel as a read-only workbook. You should now have a column of URLs.
- Depending on how many XML Sitemaps exist, you’ll need to repeat the process for each and combine them into one list when finished.*
- If you did not have to pull URLs from Google Analytics or the Search Console, take your list of URLs and run them through Screaming Frog on List Mode and audit away!
* If you notice anything funky with the XML Sitemaps, like a Sitemap is missing URLs that you absolutely know exist onsite, you’ll need to gather a list of URLs from Google Analytics and Search Console in addition to the Sitemaps before continuing.
To Pull URLs from Google Analytics:
- Login to G/A–>Behavior–>All Pages–>segment by Organic Traffic
- Make sure your dashboard is set to capture data from the longest timeframe possible.
- Export the URLs. You can only export 5000 at a time manually, or you can use the API to export all.
- If the export only give you the URIs, use Concatenate to add the domain to each so you have a list of full URLs again.
- Now add these URLs to your list of URLs from the XML Sitemaps and de-dupe.
To Pull URLs from Search Console:
- Login to Search Console and select Search Traffic–>Search Analytics. Set the date range to capture the largest sample possible and select “Pages.”
- Scroll down and select “Show 500 rows.” Search Console will only export what’s visible. Click Download.
- If you’re only given the URIs, use Concatenate to add the domain back to each so you have a list of full URLs again.
- Now add these URLs to your list of URLs from XML Sitemaps and Google Analytics.
- De-dupe your entire list.
- Now that you’ve compiled a list of URLs from Google Analytics, Search Console, and any XML Sitemaps you can run it through Screaming Frog in List Mode.
Using this method, we discovered that a whopping 26% of the client’s top 500 pages (1300 pages total) were faulty 301 redirects requiring immediate attention, among other recommendations. If you have a client in the same scenario, this is a process you should really run through.
Coming Soon: A Custom Crawler From Greenlane Labs
Our dev team is building a custom crawler and site audit tool for Ajax-based websites to use for our enterprise-level clients, and we’re aiming to release it as a free tool in the near future! Our custom crawler will analyze the post-rendered Ajax pages, spidering links and auditing the rendered HTML after the entire page is loaded. Watch this space!
Moral of the Story
Sweet, sweet freedom! Your team’s hands aren’t tied anymore when it comes to crawling and auditing an Ajax application, and you never have to settle for the excuse of “can’t crawl it, built in Ajax,” ever again. Now you have a solid process to find, export, and crawl the URLs of a site built in Ajax. Don’t forget to check back often (or sign up for our newsletter by clicking “Sign Up” at the bottom of this page) for news about the release of our *free* Custom Crawler and Site Audit Tool for Ajax-based Websites.
Do you have your own method of crawling or auditing sites that have been built in Ajax? Share it with us in the comments.