Failing Reinclusion Requests? How To Uncover Those “Harder To Find” Links.
Sometimes desperate times call for desperate measures. This post is about a desperate measure.
We had a client with a manual link penalty. We did some work (using my outline from this post). Rankings started going up and traffic/conversions started boosting. Then, a few days later, the next Google Notification came in. It’s like playing digital Russian Roulette with those things – you’ll either be thrilled or be in a lot of pain.
This time Google said they “changed” our penalty, as there were still some spammy links out there.
Remember, not all penalties have the same impact. Clearly ours was lessened (which was continually proven in the weeks to follow), but our client – rightfully so – wanted to have the whole penalty removed. The problem was we couldn’t find anymore bad links. Everything from Ahrefs, OSE, Google Webmaster Tools, Bing Webmaster Tools, and Majestic (etc.) was classified and handled appropriately.
Google’s notifications sometimes show some additional samples of poisonous links. This time we were showed only two links of forum spam, something we found zero instances of previously. Old school, dirty forum spam usually is belched out in huge, automated waves. We asked the client, who asked their previous vendors, if they had any knowledge of the link spamming. Nobody knew anything about it, so any chance of getting a list of these URLs (which was probably very low anyway) was now nil. But how did we miss all of it?
The problem was, this forum spam was so deep in the index that the major tools couldn’t find them. Even Google’s Webmaster Tools report didn’t reveal them. That’s right – Google’s notification was showing us links existing, but weren’t even giving us insight into those links through Webmaster Tools. They never got any clicks so we weren’t finding them in Google Analytics. Google’s vague link reporting functions and vague, boilerplate notifications weren’t helping us help them.
The only way to find these deep links was through the use of Google’s search engine. Unless you have a staff of hundreds and nothing but time to manually pull results and analyze one by one, this didn’t seem possible. But we came up with with a reasonably easy process using Cognitive SEO, Scrapebox, Screaming Frog, and good old Excel, to try to emulate this activity with at least some success.
Note: I feel obligated to tell you that this is not going to be an exhaustive solution. I don’t think there is one. There’s limitations to what Google will actually serve and what the tools listed in the post can actually do. To give you some good news, Google will likely release you from a penalty even though you didn’t clean up every single spammy link. All the clients I’ve gotten out of the doghouse still had some spam out there we weren’t able to find. To Google’s credit, at least they seem to understand that. Hopefully this process will help you out enough to get the job done when your repeated reinclusions are denied (even after really, really trying).
Determining the footprints
We’re going to have to beat Google into giving us opportunity. The problem is, we’re going to get a serious amount of noise in the process.
We know the inanchor: operator can be helpful. It’s not as powerful as we’d like, but it’s the best we have. A search in google like inanchor:”bill sebald” will ask Google to return sites that link using “bill sebald” as anchor text. This will be very valuable… as long as we know the anchor text.
Step 1. Get the anchor text
This can be done in a few ways. Sometimes your client can reveal the commercial anchors they were targeting, sometimes they can’t. All the major backlink data providers give you anchor text information. My favorite source is Cognitive SEO, because they give you a nice Word Cloud in their interface right below their Unnatural Link Detection module (see my previous post for more information on Cognitive).
Collect the anchor text, paying special attention to any spammy keywords you may have. I would recommend you review as many keywords as possible. Jot them down in a spreadsheet and put them aside. Don’t be conservative here.
You also want to be collecting the non-commercial keywords. Like, your brand name, variations of your brand name, your website URL variations, etc. Anything that would be used in a link to your website referencing your actual company or website.
Together you’ll get a mix of natural backlinks and possibly over-optimized backlinks for SEO purposes. We need to check them all, even though the heavily targeted anchors are probably the main culprit here.
Get The Results
This is where Scrapebox comes in. I’m not going to give you a lesson (that’s been done quite well by Matthew Woodward and Jacob King). But if you’re not familiar, this powerful little tool will scrape the results right out of Google, and put them in a tabular format. You will want proxies or Google will throw captchas at you and screw up your progress. Set the depth to Scrapebox’s (and Google’s) max of 1,000, and start scraping.
Step 1: Enter in your queries
In the screenshot example below, I entered one. Depending on results, and how many commercial anchor text keywords you’re looking for, you want to add more. This might require a bunch of back and forth, and exporting of URL’s, since you have a limitation in how much you can pull. I like small chunks. Grab a beer and put on some music. It helps ease the pain.
But don’t just do inanchor: queries. Get creative. Look for your brand names, mentions, anything that might be associated with a link.
Step 2: Choose all the search engines as your target
In most cases you’ll get a lot of dupes, but Scrapebox will de-dupe for you. In the errant case where Bing might have some links Google isn’t showing, it may come in handy. Remember – Google doesn’t show everything it knows about.
Step 3: Paste in your proxies
It seems Google is on high alert for advanced operators en masse. I recommend getting a ton of proxies to mask your activities a bit (I bought 100 from squidproxies.com, a company I’ve been happy with so far. H/T to Ian Howells)
Step 4: Export and aggregate your results
After a few reps, you’re going to get a ton of results. I average about 15,000. Scrapebox does some de-duping for you, but I always like to spend five minutes cleaning this list, filtering out major platforms like Youtube, Yahoo, Facebook, etc, and removing duplicates. Get the junk out here and have a cleaner list later.
Find The Links
Got a huge list of webpages that may or may not have a link to you? Wouldn’t it be great to find any links without checking each page one by one? There is. Screaming Frog to the rescue.
Copy and paste your long list out of Excel and into a notepad file. Save as a .txt file. Then, head over to Screaming Frog.
Choose: Mode > List
Upload your recently created .txt file.
Then choose: Configuration > Custom
Enter in just the SLD and TLD of your website. See below:
Now when you click start, Screaming Frog will only search the exact URL in your text file, and check the source code for any mention of yoursite.com (for example). In the “custom” tab, you can see all the pages Screaming Frog found a match. Be careful, sometimes it will find hyperlinks that aren’t actually linked, email addresses for you, or hotlinked images.
Boom. I bet you’ll have more links than you originally did, many of which are pulled from the supplemental hell of Google’s index. Many of these are in fact so deep that OSE, Ahrefs, Majestic, etc., don’t ever discover them (or they choose to suppress them). But, odds are, Google is counting them.
The (Kinda)Fatal Flaw With This Procedure
Remember earlier when I said this wasn’t a perfect solution? Here’s the reason. Some of these pages that Google shows for a query are quite outdated, especially the deeper you go in the index. In many cases you could grab any one of the URLs that you found that did not have a link to your site (according to Screaming Frog), and look at the Google cache, then find the link. Did Screaming Frog fail? No. The link has vanished since Google last crawled the URL. Sometimes these deeply indexed pages don’t get crawled again for months. In a month the link could have been removed or been paginated to another URL (common in forum spam). Maybe the link was part of an RSS or Twitter feed that once showed in the source code but has since been bumped off.
The only way I know to overcome this takes a lot of processing – more than my 16gb laptop even had. Remember the part where you upload the full list of URLs into Screaming Frog in list mode? Well, if you wanted to pull of the governers, you could actually crawl these URLs and their connected pages as well by going to Configuration >Spider>Limits and remove the limit search depth tick, which applies a crawl depth of ‘0’ automatically when switching to list mode. I was able to find a few more links this way, but it is indeed resource intensive.
Has It Really Come To This?
This is an extreme example on rare cases.
Yesterday we had a prospect call our company who was looking for a second opinion. Their site had a penalty from some SEO work done previously. The current SEO agency’s professional opinion was to burn the site. Kill it. Start over. My gut-second opinion was that it should (and could) probably be saved. After all, there’s branding on that site. The URL is on their business cards. It’s their online identity and worth a serious attempt at rescue. In this case I think extra steps like the above might be in order (if it should come to that). But if it’s a churn-and-burn affiliate site, maybe it’s not worth the effort.
Post-penguin we find that removing the flagged links, with the parallel event of links just becoming less and less valuable as the algorithm refines itself, does keep rankings from bouncing completely back to where they were before – in most, but not all, cases. That’s a hard pill for some smaller business owners to swallow, but I have never seen a case of penalty removal – where all the levels of rank affecting penalty were removed – keep a site from never succeeding in time. Time being the keyword.
So yeah, maybe it really has “come to this,” If your site is worth saving. At the very least you’ll be learning your way around some incredible powerful tools like Scrapebox, Cognitive SEO, and Screaming Frog.
I’m excited to see if anyone has a more refined or advanced way to achieve the same effects!
- How Mission Marketing Can Improve Your SEO July 2, 2015
- How to Find Old Redirect Opportunities & Reclaim Links (w/The WayBack Machine) June 22, 2015
- Republishing Your Content May Still Be Dangerous May 29, 2015
- All Your Content Doesn’t Matter Without Meaning May 6, 2015