Introducing The Simple SEO Site Audit Tool

Last week our team attended a local (Philadelphia) SEO meetup where there was a presentation from local wiz Sean Malseed of Circlerank. The presentation was called “Build Your Own Damn (SEO) Tools With Google Apps.” He showed us how to use Google Sheets for scraping and pulling API data to build your own custom tools. He also shared his own site that has some really incredible tools ready for free use:

Rank Tank

Google Drive (notably Google Spreadsheets) are the free competitor to Microsoft Office. Cloud based. Web-based. But with the ability for webpage scraping with Xpath (querey language) and the ImportXML commands.

We use this ability for our client reports already, but Sean definitely opened our eyes to more of the advanced abilities of Google’s spreadsheets. After the presentation, I shared a few ideas of tools I’d love to see that (now) appeared possible with Google Apps. Sean seemed to be fond of the ideas; so much so, he already created the first one – Simple SEO Site Audit Tool

What Can The “Simple SEO Site Audit Tool” Do?

Sure, there’s plenty of plugins and pay tools that can give you a quick technical and tag audit of a given URL. However not many (with the exception of the desktop apps URLprofiler and Screaming Frog) can give you a quick audit in bulk, let alone right in your browser.  It turns out with Xpath, Google can.

What I really wanted to see was a tool that could audit a website and call out the schema that was implemented per URL.  I wanted to be able to find the URLs that had articles, and see if they were properly marked up. I wanted to be able to find URLs that had videos, and see if they were properly marked up.  I think you get the idea…

Now, my dream tool would be able to look at the page and say, “yup – this is an article. I’m certain. And it does not have the appropriate schema.” The problem is a scraper can’t be certain what the assets truly are.  There’s no 100% universal footprint to scan for.  So, we decided to list out the title, description, and headers of each page to hopefully help the end-user recognize the kind of page it is.  Many times the title of a page can easily suggest whether its a video, a blog post, a product, etc.

And with that, the Simple SEO Site Audit Tool was born.

Since we’re dealing with Google Drive here, you need to make a copy of the spreadsheet.  Then paste in a list of URLs, or paste in your XML sitemap. Within seconds you’ll get a great overview of your site, like this snippet example (column J is my favorite!):

Click for larger image:


Pretty outstanding. Now you can target URLs that need updating in terms of schema, tags, and their overall messaging. With just the quick review of the results above I can see I’ve been lazy on my blog posts’ meta descriptions, a touch heavy on my H1 tags (and even heavier on my H2s), and that my judicious use of schema seems accurate.

This is something you can use for schema audits, quick audits to compare against a competitor, or as data to review on a sales call.


A huge thanks to Sean for the brains and speed. He and I will be putting together a few more tools, so watch this spot.  And make sure you check out for more of Sean’s work, or go to Sean’s landing page for the tool.

How To Work Relationships and Concepts Into Your Copy

It’s an exciting time in the SEO industry. Many things have changed – by now much of the industry has acknowledged and began an evolved form of SEO practices. Google has hit adolescence; it’s entered into Junior High School.

The folks at Mountain View made the conscious decision that keywords alone couldn’t deliver them the results they wanted to see (ahem, “their users wanted to see”). Google tried some different modeling, but ultimately came around to semantic search (that is, using semantic technology to refine the query results). Now I said much of the industry has picked up on it. Not all. I still see a lot of pretending Panda, Penguin and Hummingbird never happened. That’s unfortunate for innocent clients around the world. But for most of us probably reading this, we’re students of a new lexicon. With words like “triples” and “entities” and “semiotics” and “topic modeling.”

Rand Fishkin did a phenomenal WBF called What SEOs Need to Know About Topic Modeling & Semantic Connectivity. This truly made my month. At Greenlane we’ve been producing what we internally call “Hummingbird Content” for our clients. It’s very much related to Rand’s WBF, and very tied in with our own February 2014 post called Optimize NOW For Entities and Relationships and July 2014 post called How Entities and Knowledge Cards Can Help With Query Intent.  Personally I was fascinated with Latent Semantic Indexing, though it didn’t really seem to ever evolve (or maybe it hit its limitations?)… but I believe Hummingbird could be the reboot I’ve been waiting for. This is Google’s biggest AI improvement. This is Google starting to finally understand intent (after over a decade of SEOs optimizing like they could). This is Google’s second brain starting to evolve and merge with their core functionality. Maybe the world didn’t get flying cars yet, but I think we are starting to get the Google results we want.

The Bigger Picture

Winning knowledge cards for clients is fun, and has been valuable for a few of ours. But the knowledge cards are only a product overlay of what Google is now starting to comprehend. It’s (in a large part) about relationships. In this regard different than the link building kind, but related to meaning and language learning. If SEO is about helping Google recognize value, it’s far time we move past just writing for users and keywords, and start writing for users, keywords, topics, and relationships. We can use everything at our disposal that we think ties in (until more testing or admissions from Google steer us in another direction), from schema to Freebase to writing to data analysis – put it all into your SEO. Marcus Tober did a presentation at SMX East 2014. It was the best demonstration of Hummingbird optimization I’ve seen yet, with help from a stage-setting slide: hummingbird update - SearchMetrics SMX East 2014 Deck   Marcus’ presentation went on to show data. Lots, and lots of data.  Natural you take notice when you see a slide like this: Holistic copy - SearchMetrics presentation SMX East 2014Make sure you view this presentation:

It’s All Coming Together

Why does Moz and SearchMetrics ranking studies show a correlation between long copy and better rankings? It’s not because of word count. It’s because you’re casting a bigger net to show – and teach – Google about your expertise and topic. If Google (post-Hummingbird) understands a million relationships already, you’re bound to strike a few connections with the ranking algorithm.  If you start worrying less about exact match keyword (where, let’s face it, stuffing really didn’t have much effect as of late anyway) and more about the relationships, you’re starting to appeal to the new Google. If you were able to make your entity known (aka Freebase, Wikipedia, etc.), that’s more relationships you’re able to throw into the boil. I’m no data scientist. I’m no Bill Slawski either. But I do think I see the big picture, and it seems pretty clear the direction we should be heading regarding content creation.

So What Do We Do?

Writing content for clients, targeting organic search, has never been a walk in the park (except in the days of text spinners I suppose). Keyword research skills and experience as a writer originally got the job done – if the job were simply getting rankings. We considered the values of evergreen vs. temporal. We considered writing for a single keyword and having lots and lots and lots of SEO landing pages – you know, because “Content is King,” yada yada…”

But content strategy started working its way into the SEO space. These tied to better on-site metrics that may or may not contribute to rankings. For marketers, content’s role in attribution and funnels started to be thought of more critically, with SEO being fully blended. SEOs started meeting with the other marketers in the organization. Thus, the SEO copywriters who got all of this were more successful. Now I argue there’s a new element to add – writing more towards topics and concepts – thus creating relationships to appeal to Google, while never forgetting the keywords and traditional SEO and marketing copywriting prerequisites (above). The handwritten graphic we used in this post is real. In addition to goals, purpose, tone, style, and keywords, we sketch out relationships before we sit down and write. Yup – it’s an added step that isn’t necessarily quick, but it makes such a better piece versus sitting down and hustling out a quickly crafted outline. To somewhat illustrate what we’re currently doing:


50% of our time is to really develop out the topic and concepts. Sometimes it’s an existing page that has an existing topic, sometimes it needs to be created or narrowed down. Think of a robust Wikipedia entry – in many cases that’s a great way of looking at topic and concept relationships.

Next, what concepts tie into your meanings? Again, think of a Wikipedia entry – their H2’s can often be seen as concepts related to the topics. In many cases relationships are naturally born out of this. They come without much effort.

The other 50% goes into expanding out your idea with traditional keyword research (which is more easily inspired by the first half of this exercise. Seriously – try it!). But you still have to tie your topic/concepts into the reason for writing in the first place. What is everything this piece should do? Once you jotted down your ideas (you’ve likely made some changes back and forth as your brain juice started flowing), you’re ready to put pen to paper.

Finding The Concepts And Relationships

In Rand’s WBF he mentioned a tool called nTopic. That was new to us, but we were unable to get it to work at this time. However, we’ve been using Alchemy API. This company let’s you demo their AlchemyLanguage API here (h/t to Matt Brown from Moz for introducing this to me): The purpose of the API is to provide language processing and text analysis for apps that need such a thing. For us, the demo suits our purpose. It’s far from perfect; “directional” is a much more apt word. But I’m a huge fan of directional data over guesswork.

At our first step, we have our topic and a handful of “top of mind” keywords to find relevant pages in Google.  Simple enough, find a competitors page (that has a lot of copy) and paste in the URL. Or copy the relevant text and paste that in.  Protip – if Google is doing its job well, the first few pages should be good candidates (duh!!!). I tend to find Wikipedia pages work in many cases, but that’s hardly the only stop you should make.  Get a few, run them through.


Next, toggle through the options. I like entities, keywords, concepts, and relations. Using the example above, in “concepts” we quickly pick up that Frank Zappa is relevant to Captain Beefheart, The Mothers of Invention, some record labels, and the Synclavier (something Zappa used to compose). Do these sound like relationships that might prove your piece is worthy of Google’s interest?  Maybe! Clearly Alchemy API does not know what Google considers relationships, but not only is this an easy way to hopefully get close, but provides great examples of things to include in copy (where relevant to the goal). In other words, Google aside, this just helps make good content.


The relations tab is also quite useful, identifying subject, action (predicate), and object. There’s your triples. RDF and semantic web for the win.



Where To Use This

This can be used anywhere there is a need for those squiggly little shapes we call letters. From blog posts to resource centers to brochure pages. Last week I wrote on ISOOSI about mission marketing. I think it works in that marketing approach as well. We do a lot of eCommerce at Greenlane, so we’ve gone so far as to try to beef up category pages this way as well. If you’ve ever heard me talk about SEOing a vending machine, I think this is one great way to go about it. Granted, a somewhat unorthodox and harder sell, but there’s little valid argument against it. This will help you create a complimentary experience to a page that otherwise typically just serves to act on people’s transactional search with disregard to informational searches.

Hell, everything I wrote about so far in this post went into the creation of this post, in which some of the concepts were really built by visiting alchemy API more than a few times.

As SEOs we’re still at the very early stages of understanding all this. Things may change drastically, better or worse. We’ve all gone down the “authorship” path and came up empty. But that’s what makes our industry so great – we have a passion for riding the first wave and being early adopters. When our work hits, it hits big. I see absolutley no down side to learning more about topic modeling and semantic search, or spending extra time doing some concept discovery. If it does nothing for your SEO, it will at least ameliorate your copy and give your users more quality and value. That can only benefit your marketing goals.

Javascript and the Modern SEO (or, how not to recommend fear to your clients)

Going back at least 10 years, Google has always crowed about Googlebot’s improvements in rendering and understanding “formerly” prohibitive languages and media. For some of the older SEOs, we remember the announcements on their breakthroughs with crawling Flash, or when they were supposedly starting to understand the words in text (with some kind of OCR tech). Javascript, which (to be frank) is blindly admonished by many SEOs, is one they’ve always been working on (from my own 2010 test). So when they say they’ve improved, it’s a little like crying wolf.

Truth is, “improve” is what they did. They had to get better at this if they really wanted to be able to catch spamming.

google reads javascriptNot all Javascript is bad. In terms of crawlability, Javascript (in the code) isn’t one I tend to worry about. I rarely include much on Javascript in most tech audits I perform unless absolutely needed. You’ll never hear me indiscriminately say, “rip out this Javascript – Google can’t read it,” without deeply reviewing it. I’ve (sadly) seen audits from our clients’ past SEO companies where they did just that, costing a decent chunk of budget to the client who trusted wholeheartedly in a false recommendation.

However, in terms of PageRank being passed, that’s another matter. On that I still err on the side of caution and recommend alternate methods just to be sure when absolutely necessary. So let me be clear – I’m not dismissing Javascript as a whole. I’m just putting the proper attention on it.

Even Google Webmaster Tools seems to treat a Javascript link as a link, in both the “links to your site” report and the 404 page report. True, the Google Webmaster Tools department may still not be 100% tied into the organic search team, but I found this link in my 404 report this morning:

<script>// <![CDATA[voto(“”);
//chartScores();function forzaNuovo() {
var div = document.getElementById(“contgraph”);
var link = document.getElementById(“link_sotto”); = ”;
div.innerHTML = “<img id=\”loadimg\” src=\”/images/load.gif\”/>”;


So the (a URL that appears nowhere else on my site or in the code of the linking page) was causing a 404 to show in my Webmaster Tools report.

If interested, here’s the page so you can see the code yourself.

The Javascript You Still Need To Scrutinize

However, Javascript that renders on the client, and maybe even deferred Javascript (post </body> tag), is a different beast. Simply put, this is Javascript that does not appear in the source code (FeedTheBot and Richard Baxter wrote some helpful pieces). But as you expected, Google has also said they’re working on this – not to mention the similar problem from iframes. In this video (which was actually recorded May 8, 2013), Matt Cutts says they’re a couple months away.  At this point that was well over a year ago.

So while we hear that Google is getting closer, we also hear tips from Google on making this kind of Javascript accessible to spiders (ahem, hashbangs!!!). Very reminiscent of when Google said in 2009, “we’re pretty good at consolidating duplicate content, but oh by the way… can you start using this new canonical tag?”

I’m working with a client now who’s struggling with indexation off a very liberal AngularJS implementation. It’s a very large brand, so it has major indexability, yet readability is definitely an issue when you look in the SERPs or Fetch as Google. I still strongly recommend the extra coding to make these pages accessible (even if it’s as simple as a service like Brombone), or finding another SEO friendly way of creating the same dynamic functionality you get from some Javascript framework. It’s as simple as that – in this case we have Javascript that truly needs to be optimized for search engines.


Well, this wasn’t a very long post… but nonetheless.  My hope is to urge some SEOs to stop being afraid of Javascript and blindly suggesting to clients it’s the devil. That’s not being a good consultant. The modern SEO has a lot of value in the marketplace today, so it’s worth the time to really learn what options are available from the technical SEO front. Our whole industry always needs integrity where it can get it; this is a perfect place to show your mettle.


Making Better SEO Reports For Your Clients

Ah, the SEO report. Sometimes the bane of our existence. Some agencies spend the majority of their time creating monthly detailed monstrosities, while others might send quick, white-labeled exports. Meanwhile, smart companies (like Seer) look for ways to use APIs and programming to speed up data pulling. At Greenlane, we took this approach as well; Keith, my partner and incurable data nerd, created our out-of-the-box reports to pull API data on traditional SEO metrics like rankings (yes – we still believe in the value), natural traffic (at the month over month and year over year level), natural conversions (same range), and every necessary target landing page metric we could think of. Then after discussing clients’ own KPIs, we add more obligatory reports to our default set.

But pulling data is only a means to an end. Data exports – especially the scheduled kind – are huge time savers. However, the downside to these automatic data pulls is the lack of necessity to go into analytics platforms to “poke around”. Simply put, you need to look for trends, see how data correlates with each other, and investigate why things are (and are not) happening as expected. You need to have notes of what you want to check for each month when you pull your reports. You need to let data inspire questions and direct you to answers. This data is what should be driving your day-to-day optimizations.

No child ever wanted to be a “report monkey” when they grew up. You shouldn’t be one either.

Don't Be A Report Monkey

Learn To Love The Reports

I’m guilty. In a past life, I was part of a company that spent so much time – by hand – downloading Omniture reports, copy and pasting cells, customizing charts, running formulas, and beautifying spreadsheets. I can make a spreadsheet look like a work of art (though Annie will always have me beat). It took 10+ hours a month. Looking back, this was a total waste of clients’ money. That’s not what we were hired to do, yet we got away with it. Granted, I do believe the aesthetics of an attractive report can at least semi-consciously suggest to the recipients your agency has talent and the money to invest in quality output (whereas this “money” may indicate success), but that’s only going to help you for so long. It’s like seeing a beautiful deck at a conference presentation – like it or not, it does give the perception of capability. This is the marketing industry after all.

But when you’re spending so much time pulling, shaping, tweaking, and formatting, you’re spending less time being a marketing detective.

I’m the guy in the company who (probably annoyingly) squawks about fonts, consistency, and aesthetics, etc… all for the reasons above. But Keith and I both feel that the reports not only have a value to the client, but a value to our team as well. These reports ultimately make our job easier. The process of creating these timely reports – believe it or not – is what makes us better at our jobs:

  • Reporting helps marketers find trends they can use to tweak campaigns
  • Reporting helps marketers come up with strategies and tactics they can try out on other clients as well
  • Reporting helps improve your ability to make educated guesses
  • Reporting gives you the ability to tie your work to ROI and validate your job
  • Reporting helps you come up with areas of opportunity that could improve the marketing mix which might otherwise go unnoticed forever
  • Reporting helps you learn Excel (vital if you’re new to the industry)

What The Clients Really Want To See

I’ve worked agency-side most my professional life. I did however have a brief stint as a client. It was very useful, as it helped me understand the daily challenges of an in-house marketer; especially the many directions they are often pulled in. When I first got reports from our PPC vendor or social marketing vendor, I wanted to tear into them. Talk strategy. Get the learnings. But, I was busy as hell. Eventually I just wanted the most impactful highlights.

An executive summary or a quick blurb of succinct natural language explanation can go a long way, especially in companies where these reports get passed around. You know the frustration you feel when you see a slidedeck on Slideshare, but can’t make any sense of the slides? You missed the accompanying presentation which sometimes leaves you more confused than ever as you click through the slides. It doesn’t mean the slides were bad or valueless – it just meant the context wasn’t there. A good executive summary provides the context.

Here’s an example of something a client might see at the bottom of one of our spreadsheet reports (click for larger image):

Sample summary

However, executive summaries can be dangerous for the clients if executed poorly. Many clients tend to accept the executive summary without questioning. Whether you have a client who uses the executive summary to dig into your brain, or one who just accepts as it, you owe it to them as hired contractors to provide the information they really need. Don’t let their lack of questions lead you into creating valueless executive summaries.

Clearly I think natural language is extremely important in telling a marketing (and data) story. Another option we recently discovered (and strongly recommend you check out for yourself) is Wordsmith For Marketing, a new service that can actually write textual reports based on data, saving your team time. We’ve started working with them and are really blown away by the exports. How a computer is writing reports like these are beyond me:


This is just a part of the long, detailed PDF. The service pulls the data from a Google Analytics connection, and lets you go in and move items and add your own content. See the summary above? That was completely written by the computer, using words like “moderate loss” and “conversion rate also slipped.” Pretty incredible, with a very cool roadmap of features to come.

This is by far the first “push-button” report I’ve seen that actually provides contextual value, but we still encourage our team to take it further.  Since Wordsmith easily allows you to add bullets and more context, we ask our team to fill in any gaps needed by affixing more observations and recommendations right into the report. For example, did we work on a specific campaign last month (with or without goal tracking)? Wordsmith won’t know, so our account managers must include all that. It’s a very useful merging of technology and manual digging that still cuts down a ton of hours.


Imagine a client running eCommerce product pages on a modern JS framework. It’s responsive and sexy but it’s not drawing search traffic. Data could suggest an evaluation of the code where you might find AngularJS, something you can drive to fix with proxies. Alternatively, imagine a client has tons of duplicate product pages – immediately your instinct is to pull the pages, put in robots/noindex solutions, and canonical tags. Yet, the data could suggest that Google already figured out the duplication issue and is still driving good traffic to the dupe pages regardless. Finally, imagine a client got a little too aggressive with a former link campaign and suddenly got stuck with an algorithmic penalty on their a deeper landing page only. Digging in deeper to a site’s analytics can quickly help you pinpoint the problem and give you a course of correction, plus help develop the priority.

These are examples you don’t get from just topical exports. The data can help you develop, prioritize, and execute all day long. Sure, it’s a pain losing the natural search keyword data with [not provided], but while that adds complexity to the keyword work in SEO, there’s still plenty of other SEO initiatives and experiments you can easily create just by making deeper data dives an important part of your day-to-day, or providing reports that you and your clients truly find valuable.


Embrace and optimize (see what I did there?) your SEO reports, but make sure you’re keeping the goals of these reports in mind all along. Once completed, the time should be spent analyzing the data and creating strategies, not creating the reports themselves. If your goals aren’t to empower your clients and empower yourselves, while holding your own feet to the fire to achieve results, you’re probably doing it wrong. Creating the right reports should be for educating both you and your clients, thus helping you really learn your chops as a marketer, while allowing the client to see the benefits of your great work.

How Entities (and Knowledge Cards) Can Help With Query Intent

Entity optimization as a big SEO play isn’t quite upon us yet. It’s a slow, growing Google addition. I know – it frustrates me too. So much potential, of which I believe will greatly improve search results in the future. Google isn’t nearly showing the fruits of everything it knows through entities, whether through cards or search results – at least not relative to the way they rank on keywords alone.

But can knowledge cards help bring qualified traffic while considering searcher intent? SEOs always talk about searchers intent. Anyone who’s been doing SEO for a while knows that building for intent can be a challenge.


Take a query like “batman the dark knight”. Was the searcher looking for the 2008 movie? The graphic novel? The upcoming game? Were they looking to buy something, or just curious about a release date? What the hell were these people thinking? This is certainly very top of the funnel stuff, and would normally yield lower conversions, but it is where many Google non-power users would start.

foil.0_cinema_1200.0Google knows these searchers expect them to be mind readers. They’re keenly aware of this. They may be working on mind-reading devices in their labs (to which I will finally invest in the tin-foil hat – I’ve got a lot of junk swimming in my head that should stay hidden). But in the meantime, through their results they give us personalized search, or this cute little cluster of links, but I doubt many click on anything here:


But if you properly create an entity, you can get better “related results” in the knowledge graph:

Related Results

Pop into Freebase and look up either of these entities, and you’ll see the details above listed out. Coincidence? Probably not. The data could have come from there. We know the Google-owned Freebase is part of their brain now. But unfortunately, the huge database of great information (granted, which needs to be checked against other sources), simply isn’t producing results yet. Whether a limitation in the knowledge card product or limitations in processing the data, I’m not sure – but I’m always hopeful Google steps it up soon.

Of course I recommend optimizing now and getting your entities in place for when Google pushes the pedal to the metal.

But for those who are working on campaigns where entities are being shown, you’re in luck. Google’s using your search history and their knowledge cards to personalize the results – sometimes in a more valuable way than the general results.

The Jaguar Example

If I were doing SEO for Jaguar, a well-known luxury brand car, I already have the benefit of Google knowing what my product is. They show some of it in their knowledge card with a simple “jaguar” search:

Jaguar Result

Obviously this isn’t all Google knows – just what they feel like showing at the present time. They’re getting this from Google+, Wikipedia and Freebase at a minimum.

Since Ralph Speth can’t go back in time and choose a new name for the company, they have to compete for search result real-estate and millions of monthly searches for the term “Jaguar”. That is, against other pages that want to rank – like the Jacksonville Jaguars, the animal, the Atari Jaguar, comic book characters, and movie titles.

Now, if I were doing SEO for the defenders of wildlife, and I wanted this top-of-the-funnel term to potentially bring me traffic and awareness, the default (above) results suck for me. It’s all cars, football teams, or pictures.

But Google does something cool…

Search history plays a role in results. It uses keywords, and ideally entities, to see relationships through queries. A query like “animal,” “panthera,” and “wild animal” is related to Jaguar. Specifically, a query like “panthera,” followed by a new search for “jaguar” gives a different result. The Jaguar car listings, ads, and knowledge card are supressed, for an option where one can click to refine their search. This isn’t even slightly hidden. See the difference between the below results and the above example?

Related Searches

Clicking the link (pointed to by the red arrow) shows a new refined search where has a listing (at the time of this writing). The query has been changed to “jaguar animal” but, through a new click-path, has the opportunity to benefit from this “jaguar” head term. I believe this is at least partially entity driven. And, I believe this is a small example of how entities can be used in the future as Google’s products become more robust.

What do you think?  Am I seeing a connection where there isn’t one?

The New Google Doesn’t Like Old SEO

I read – and commented on – a great post called Panda 4.0 & Google’s Giant Red Pen by Trevin Shirley. Panda 4.0 just hit; the SEO space is hiding under their desk, with some reacting either out of panic or for show.

It’s definitely news, but at this point, I don’t see any reason to scream from the rooftops at Google. It’s what we should be expecting by now.

In 2011, the first Panda showed us Google is not afraid to drop atom bombs. Panda opened the door for Penguin, and many updates have come since. Matt Cutts said he wished Google had acted sooner, and in his shoes, I’d probably agree.

Let’s not forget how spammy the results used to be:

Google junk

I can imagine the conversation at the Googleplex between the webspam and search team:

“Man, how did you let this get so bad?”

“Me? I though you were paying attention…”

“Look – we need to fix this. But the algorithm can only be tweaked so hard. I mean, it’s not Skynet yet.”

“But people think it is…”

“We’re going to lose our shirts if we don’t act quick. How about we take drastic measures.”

“But the SEO community will have a cow.”

“But hopefully the rest of the world won’t notice and just start loving, trusting, and using a cleaner Google!”

“Agreed. Hey Navneet Panda… do you have any ideas?”

It’s A New Google – We Need To Accept It, Rebuild


Maybe they should have named these things Godzilla instead of Panda or Penguin. The battles that ensued since the birds and the bears were nasty. Some search results were leveled. I’m not being dramatic for the sake of a metaphor – I’m pretty sure we can all agree the results have never been the same. Some SEOs were/are slow to give up the fight. Some agencies still sell SEO that doesn’t work. Others, however, have realized the new rules – while different – still offer great opportunity.

Google declares their war on spammers a victory, noting black hat forums have slowed down. They’ve admitted to throwing some FUD into the mix like Kim Kardashian’s publicist might do, but for the greater good of their mission – to fix the results and uphold their “reputation.” All the hatemail and tweets to Matt Cutts isn’t going to change this. I’m pretty sure he’s holding steadfast. While Google won’t nod to the fact that some good got swept up in the bad, they obviously know it.

But honestly, I think it works for me. I think the changes, and casualties, were necessary. Were they supposed to wait until they were perfect? Plus I was getting tired of the lack of imagination… not that some of the dark arts weren’t brilliantly designed and executed. But in some sectors, SEO is very slow to change.

What I mean is, I was missing the marketing. In 2007 I was in a full-service agency’s marketing department doing SEO. Yet, SEO didn’t feel like marketing then. It was still firmly planted in web development. But in my situation, marketing and web development were siloed. Our departments weren’t friends (some internal politics at play). As asinine as that sounds now, I learned it wasn’t uncommon in big agencies back then. So, to make our SEO offering work, I had to tie “marketing” and “technical” together.

As evolution would have it, there’s no doubt that SEO is a marketing channel now… so I kind of lucked out by getting an early jump on it. The more I tied the two together, the more long-lasting the results were. Even today. It’s the only real Panda/Penguin proof strategy I’ve seen.


Like many rock bands, Google has changed their formula. I agree – relatively speaking, Google now works pretty well. Or at least they’re finally poised to substantially improve. And that’s from me – a guy who hates change. Update your website or UI and I throw a temper tantrum. But realistically, has anything ever stayed the same? Did David Bowie not continue to produce great music, albeit different? Did Empire Strikes Back not kick more ass after changing directors? Did Windows 8 not improve upon Windows 7?

Granted, it’s still Google’s property, and they can do with it as they please, so if they only want to represent a portion of the web, I suppose they have that right. Maybe in hindsight it was kind of ambitious to attempt to organize all the world’s webpages. Ah, the dreams of two bright-eyed Stanford students.

In his post, Trevin quoted something from Hacker News that I found very interesting: ““We are getting a Google-shaped web rather than a web-shaped Google.” I sat with this for a few days. Ultimately I don’t think we’re getting a Google shaped web or a web-shaped Google. I understand the concern, especially when Google is a massive part of discovering new content and a provider of big revenue. But the web is much larger than Google. The citizens that create on the web, outside of the SEO bubble, are very much their own people, inspired by anything and everything. Alternatively, a web-shaped Google – which I argue was their first attempt – was a bit unrealistic.

When I worked with a client who was an innocent casualty of an update, I used to get angry. I used to think Google was a bunch of jerks. Then, I got creative, and found ways to get the client back onto Google’s radar – usually to a larger traffic and brand-recognition increase. Plus, I started relying on some of the other valuable internet marketing tools and channels. Talk about silver linings.

But honestly no client I’ve ever had, who got hurt by a Google update, was a true victim.  Google always told us they wanted to rank the best, most useful content to their users. I’ve worked with some clients who got the traffic, but only because Google didn’t realize they weren’t the best. I’ve seen sub-par, homogenized content ranking well, and though, “meh – might as well ride it while Google is still dumb.”

Now looking back, if they got swept up in an update, it’s because they really weren’t doing more than the bare-bone basics – Google simply stepped up their game. These sites weren’t the originator of content, topics, and incredible ideas. They were just “running through the motions”.

Maybe it’s time to accept Google has graduated from grade-school.


In another post I wrote about lazy SEO. The more I think about it, I think old-school SEO is lazy SEO because it simply doesn’t move the needle enough to quantify hitching your wagon to. I truly think if you haven’t moved on by now, you’re only going to be playing catch-up in the next couple years.

So what do you think? Am I right? Or have a misguided myself?

Failing Reinclusion Requests? How To Uncover Those “Harder To Find” Links.

Sometimes desperate times call for desperate measures. This post is about a desperate measure.

We had a client with a manual link penalty. We did some work (using my outline from this post). Rankings started going up and traffic/conversions started boosting. Then, a few days later, the next Google Notification came in. It’s like playing digital Russian Roulette with those things – you’ll either be thrilled or be in a lot of pain.

This time Google said they “changed” our penalty, as there were still some spammy links out there.

Remember, not all penalties have the same impact. Clearly ours was lessened (which was continually proven in the weeks to follow), but our client – rightfully so – wanted to have the whole penalty removed. The problem was we couldn’t find anymore bad links. Everything from Ahrefs, OSE, Google Webmaster Tools, Bing Webmaster Tools, and Majestic (etc.) was classified and handled appropriately.


Google’s notifications sometimes show some additional samples of poisonous links. This time we were showed only two links of forum spam, something we found zero instances of previously. Old school, dirty forum spam usually is belched out in huge, automated waves. We asked the client, who asked their previous vendors, if they had any knowledge of the link spamming. Nobody knew anything about it, so any chance of getting a list of these URLs (which was probably very low anyway) was now nil. But how did we miss all of it?

The problem was, this forum spam was so deep in the index that the major tools couldn’t find them. Even Google’s Webmaster Tools report didn’t reveal them. That’s right – Google’s notification was showing us links existing, but weren’t even giving us insight into those links through Webmaster Tools. They never got any clicks so we weren’t finding them in Google Analytics. Google’s vague link reporting functions and vague, boilerplate notifications weren’t helping us help them.

Matt Cutts Facepalm - Google

The only way to find these deep links was through the use of Google’s search engine. Unless you have a staff of hundreds and nothing but time to manually pull results and analyze one by one, this didn’t seem possible. But we came up with with a reasonably easy process using Cognitive SEO, Scrapebox, Screaming Frog, and good old Excel, to try to emulate this activity with at least some success.

Note: I feel obligated to tell you that this is not going to be an exhaustive solution. I don’t think there is one. There’s limitations to what Google will actually serve and what the tools listed in the post can actually do. To give you some good news, Google will likely release you from a penalty even though you didn’t clean up every single spammy link. All the clients I’ve gotten out of the doghouse still had some spam out there we weren’t able to find. To Google’s credit, at least they seem to understand that. Hopefully this process will help you out enough to get the job done when your repeated reinclusions are denied (even after really, really trying).

Determining the footprints

We’re going to have to beat Google into giving us opportunity. The problem is, we’re going to get a serious amount of noise in the process.

We know the inanchor: operator can be helpful. It’s not as powerful as we’d like, but it’s the best we have. A search in google like inanchor:”bill sebald” will ask Google to return sites that link using “bill sebald” as anchor text. This will be very valuable… as long as we know the anchor text.


Step 1. Get the anchor text

This can be done in a few ways. Sometimes your client can reveal the commercial anchors they were targeting, sometimes they can’t. All the major backlink data providers give you anchor text information. My favorite source is Cognitive SEO, because they give you a nice Word Cloud in their interface right below their Unnatural Link Detection module (see my previous post for more information on Cognitive).

word cloud

Collect the anchor text, paying special attention to any spammy keywords you may have. I would recommend you review as many keywords as possible. Jot them down in a spreadsheet and put them aside. Don’t be conservative here.

You also want to be collecting the non-commercial keywords. Like, your brand name, variations of your brand name, your website URL variations, etc. Anything that would be used in a link to your website referencing your actual company or website.

Together you’ll get a mix of natural backlinks and possibly over-optimized backlinks for SEO purposes. We need to check them all, even though the heavily targeted anchors are probably the main culprit here.

Get The Results

This is where Scrapebox comes in. I’m not going to give you a lesson (that’s been done quite well by Matthew Woodward and Jacob King). But if you’re not familiar, this powerful little tool will scrape the results right out of Google, and put them in a tabular format. You will want proxies or Google will throw captchas at you and screw up your progress. Set the depth to Scrapebox’s (and Google’s) max of 1,000, and start scraping.

Step 1: Enter in your queries

In the screenshot example below, I entered one. Depending on results, and how many commercial anchor text keywords you’re looking for, you want to add more. This might require a bunch of back and forth, and exporting of URL’s, since you have a limitation in how much you can pull. I like small chunks. Grab a beer and put on some music. It helps ease the pain.

But don’t just do inanchor: queries. Get creative. Look for your brand names, mentions, anything that might be associated with a link.

Step 2: Choose all the search engines as your target

In most cases you’ll get a lot of dupes, but Scrapebox will de-dupe for you. In the errant case where Bing might have some links Google isn’t showing, it may come in handy. Remember – Google doesn’t show everything it knows about.

Step 3: Paste in your proxies

It seems Google is on high alert for advanced operators en masse. I recommend getting a ton of proxies to mask your activities a bit (I bought 100 from, a company I’ve been happy with so far.  H/T to Ian Howells)

scrapebox graphic

Step 4: Export and aggregate your results

After a few reps, you’re going to get a ton of results. I average about 15,000. Scrapebox does some de-duping for you, but I always like to spend five minutes cleaning this list, filtering out major platforms like Youtube, Yahoo, Facebook, etc, and removing duplicates. Get the junk out here and have a cleaner list later.

Find The Links

Got a huge list of webpages that may or may not have a link to you? Wouldn’t it be great to find any links without checking each page one by one? There is. Screaming Frog to the rescue.

Copy and paste your long list out of Excel and into a notepad file. Save as a .txt file. Then, head over to Screaming Frog.

Choose: Mode > List

Upload your recently created .txt file.

Screaming Frog 1

Then choose: Configuration > Custom

Enter in just the SLD and TLD of your website. See below:

Screaming Frog 2

Now when you click start, Screaming Frog will only search the exact URL in your text file, and check the source code for any mention of (for example). In the “custom” tab, you can see all the pages Screaming Frog found a match. Be careful, sometimes it will find hyperlinks that aren’t actually linked, email addresses for you, or hotlinked images.

Boom. I bet you’ll have more links than you originally did, many of which are pulled from the supplemental hell of Google’s index. Many of these are in fact so deep that OSE, Ahrefs, Majestic, etc., don’t ever discover them (or they choose to suppress them). But, odds are, Google is counting them.

The (Kinda)Fatal Flaw With This Procedure

Remember earlier when I said this wasn’t a perfect solution? Here’s the reason. Some of these pages that Google shows for a query are quite outdated, especially the deeper you go in the index. In many cases you could grab any one of the URLs that you found that did not have a link to your site (according to Screaming Frog), and look at the Google cache, then find the link. Did Screaming Frog fail? No. The link has vanished since Google last crawled the URL. Sometimes these deeply indexed pages don’t get crawled again for months. In a month the link could have been removed or been paginated to another URL (common in forum spam). Maybe the link was part of an RSS or Twitter feed that once showed in the source code but has since been bumped off.

The only way I know to overcome this takes a lot of processing – more than my 16gb laptop even had. Remember the part where you upload the full list of URLs into Screaming Frog in list mode? Well, if you wanted to pull of the governers, you could actually crawl these URLs and their connected pages as well by going to Configuration >Spider>Limits and remove the limit search depth tick, which applies a crawl depth of ‘0’ automatically when switching to list mode. I was able to find a few more links this way, but it is indeed resource intensive.

Limit Search Depth

Has It Really Come To This?

This is an extreme example on rare cases.

Yesterday we had a prospect call our company who was looking for a second opinion. Their site had a penalty from some SEO work done previously. The current SEO agency’s professional opinion was to burn the site. Kill it. Start over. My gut-second opinion was that it should (and could) probably be saved. After all, there’s branding on that site. The URL is on their business cards. It’s their online identity and worth a serious attempt at rescue. In this case I think extra steps like the above might be in order (if it should come to that). But if it’s a churn-and-burn affiliate site, maybe it’s not worth the effort.

Post-penguin we find that removing the flagged links, with the parallel event of links just becoming less and less valuable as the algorithm refines itself, does keep rankings from bouncing completely back to where they were before – in most, but not all, cases. That’s a hard pill for some smaller business owners to swallow, but I have never seen a case of penalty removal – where all the levels of rank affecting penalty were removed – keep a site from never succeeding in time. Time being the keyword.

So yeah, maybe it really has “come to this,” If your site is worth saving. At the very least you’ll be learning your way around some incredible powerful tools like Scrapebox, Cognitive SEO, and Screaming Frog.

I’m excited to see if anyone has a more refined or advanced way to achieve the same effects!


Review of Linkody

There must be thousands of SEO tools. While many tools are junk, a few great tools rise up each year and grab our attention. They’re often built for some very specialized needs. Of all the industries these brilliant developers could build in, they chose SEO. I’m always thankful and curious. As a fan of SEO tools, both free or paid, I’m excited to learn about new ones.

A few months ago I got an email from François of Linkody asking for some feedback. It did a nice job of link management and monthly ‘new link’ reporting. Pricing was very low, it’s completely web-based, and is very simple and clean. It pulls from the big backlink data providers, and even has a free client-facing option (exclusively using Ahrefs) at Great for quick audits. I’ve used it quite a bit myself, and was happy to give a testimonial.

The link management function isn’t new to the SEO space. Many tools do it already, like Buzzstream and Raven – and they do it quite well. Additionally, link discovery is an existing feature of tools like Open Site Explorer, yet this is an area where I see opportunity for growth. I love the idea of these ‘new link’ reports, but honestly, haven’t found anything faster than monthly updates. I know it’s a tough request, but I mentioned this to François. By tracking “as-it-happens” links, you can jump into conversations in a timely manner, start making relationships, and maybe shape linking-page context. You might even be able to catch some garbage links you want to disassociate yourself from quicker.

The other day I received a very welcomed response: “I wanted to inform you of that new feature I’ve just launched. Do you remember when you asked me if I had any plan to increase the (monthly) frequency of new links discovery? Well, I increased to a daily frequency. Users can know link their Linkody account with their Google Analytics account and get daily email reports of their new links, if they get any of course.

Sold. That’s a clever way to report more links, and fill in gaps that OSE and Ahrefs miss.

Click image for larger view


Upon discovering the new URL, you can choose to monitor it, tag it, or export.

The pros: Linkody picks up a bunch of links on a daily basis that some of the big link crawlers miss. You can opt for daily digest emails (think, Google Alerts). Plus it’s pretty cheap!

The cons: It needs Google Analytics. Plus, for the Google Analytics integration to track the link, the link has to actually be clicked by a user. However, for those who have moved to a “link building for SEO and referral traffic generation” model (like me), this might not be much of a con at all.

What’s on the roadmap?

As François told me, “next is displaying more data (anchor text, mozrank…) for the discovered link to help value them and see if they’re worth monitoring. And integrating social metrics.” Good stuff. I’d like to see more analytics packages rolled in, and more data sources? Maybe its own spider?


If you’re a link builder, in PR, or a brand manager, I definitely recommend giving Linkody a spin. It’s a great value. Keep your eye on this tool.


Optimize NOW For Entities and Relationships

I remember a few years ago blowing the mind of a boss with a theory that Google would eventually rank (in part) based on their own internal understanding of your object. If Wikipedia could know so much about an object, why couldn’t Google? In the end, I was basically describing semantic search and entities, something that has already lived as a concept in the fringe of the mainstream.

Sketching It Out

Sketching out relationships on a whiteboard

In the last year Google has shown us that they believe in the value of a semantic web and semantic search engines. With their 2010 purchase of Metaweb (which is now Freebase), and the introduction of the knowledge graph, the creation of schema, and the sudden delivery of a new algorithm called Hummingbird, Google is having one hell of a growth spurt. It’s not just rich snippets we’re talking about or results that better answer Google Now questions.

We used to say Google had an elementary school education. They understood keywords and popularity. Now it can be argued Google has graduated, and is now enrolled in Silicon Valley Jr. High School. Comprehension has clearly improved. Concepts are being understood and logical associations are being made. A person/place/thing, and some details about them (as Google understands it), are starting to peek through in search results.

Yesterday was my birthday. Yesterday was also the day I became Google famous – which to an SEO geek is kind of awesome. I asked Google a couple questions (and some non-questions), and it showed me I’m an entity (incognito and logged in):

  • how old is bill sebald
  • what is bill sebald’s age
  • bill sebald age
  • birthday of bill sebald

This produced a knowledge result (like we’ve seen a couple times before). Details on how I got this are illustrated deeper in this post:


The comprehension level has its limit.  Ask Google “when was bill sebald born” or “what age is bill sebald”  or “when is bill sebald’s birthday,” and no such result appears.  For some reason an apostrophe throws off Google – quereying “bill sebald’s age” vs. the version bulleted above, and there’s no knowledge result. Also, reverse the word order of “bill sebald age” to “age of bill sebald” and there’s no result.

Then, ask “bill sebald birthday” and you’ll get a different knowledge result apparently pulled from a Wikipedia page. This doppelganger sounds a lot more important than me.



We know Google has just begun here, but think about where this will be in a few years. At Greenlane, we’re starting entity work now. We’re teaching our clients about semantic search, and explaining why we think it’s got a great shot at being the future. Meh, maybe social signals and author rank didn’t go the way we expected (yet?), but here’s something that’s already proving out a small glimpse of “correlation equals causation.” It doesn’t cost much, it makes a lot of sense for Google’s future, and seems like a reasonable way to get around all the spam that has manipulated Google for a decade.

A new description of SEO services?

I’m not into creating a label. Semantic SEO isn’t a necessary term. You might have seen it in some recent presentations or blog post titles, but to me this is still old-fashioned SEO simply updating to Google’s growth. This is the polar opposite to the “SEO is dead” posts we laugh at. Someone’s probably trying to trademark the “semantic SEO” label right now, or at least differentiate themselves with it. To me, as an SEO and marketer, we always cared about the intent of a searcher – semantic search brings us closer to that. We always cared about educating Google about our values, services, and products. We always wanted to teach Google about meaning (at least for those who were doing LSI work and hoping it would pay off). If this architecture becomes commonplace, it becomes part of any regular old SEO’s job duties. Forget a label – it’s just SEO.

The SEO job description doesn’t change. Only our strategies, skills, and education. We do what we always do – mature right along with the algorithms. We will optimize entities and relationships.

Where have we come from, and where are we going?

Semantic search isn’t a new concept.

I think the knowledge graph was one of the first clear indications of semantic search. Google is tipping its hand and showing some relationships it understands. Look at the cool information Google knows about Urban Outfitters. This suggests they also know, and can validate this information – like CEO info, NASDAQ info, etc. Google’s not quick to post up anything they can’t verify.


Click through some of the links (like CEO Richard Hayne) and you’ll get more validated info.


These are relationships Google believes to be true. For semantic search to work, systems need to operate seamlessly across different information sources and media. More than just links and keywords, Google will have to care about citations, mentions, and general well-known information in all forms of display.

Freebase, as expected, uses a triple store. This is a great user-managed gathering of information and relationships. But like any human-powered database or index, bad information can get in – even with a passionate community policing the data. Thus, Google usually wants other sources. Wikipedia helps validate information. Google+ helps validate information.

The results I got for my age (from Google above) probably came from an entry I created for myself in Freebase. The age is likely validated by my Google+ profile where I listed my birthdate. Who knows – maybe Google also made note of a citation on Krystian Szastok’s post about Twitter SEO’s Birthdays where I’m listed there too. I’m sure my birthday is elsewhere.

But what about my height? Google knows that too, and oddly enough, I’m fairly sure the only place on the web I posted that was in Freebase:

bill sebald height

But I also added information about my band, my fiance, my brother and sister – none of which I can seem to get a knowledge listing for. However, Google seems to have arbitrarily given one for my parents, who as far as I know are “off the grid.”


Another knowledge result came in the form of what I do for a living. This one is easy to validate (in this case only helped with several relevant links I submitted through Freebase):


This is just what Google wants to show, not all it knows

This is really the exciting part for me. When I first saw the knowledge graph in early 2013, it wasn’t just a, “that’s cool – Google’s got a new display interface,” type of thing. This was my hope that my original theory may be coming true.

In fact, in a popular Moz Whiteboard Friday from November 2012 called Prediction: Anchor Text is Weakening…And May Be Replaced by Co-Occurrence, I was hopeful again. There was a slight bit of controversy here on how a certain page was able to rank for a keyword without the traditional signs of SEO (in this case the original title mentioned co-citation, where Bill Slawski and Joshua Giardino brought some patents to light – see the post for those links). My first though – and I can’t bring myself to rule it out – might have been that it’s none of the above; instead, this is Google ranking based on what it knows about relations of the topic. Maybe this is a pre-Hummingbird rollout sample? Maybe this is the future of semantic search? Certainly companies buy patents to hold them hostage from competitors. Maybe Google was really ranking based of internal AI and known relationships?

Am I a fanboy? You bet! I think the idea of semantic search is amazing. SEO is nothing if not fuzzy, but imagine what Google could do with this knowledge. Imagine what open graph and schema can do for feeding Google information on creating deeper relationships. Couldn’t an expert (ala authorship) feed trust in a certain product? Couldn’t structured data improve Google’s trust of a page? Couldn’t Google start to figure out easier the intent of certain searches, and provide more relevant results based on your personalization and those relationships?

What if it could get to the point where I could simply Google the term “jaguar.” Google could know I’m a guitarist, I like Fender guitars, and I’m a fan of Nirvana (hell – it’s a lot less invasive than the data Target already has on me). Google could serve me pages on the Fender Jaguar guitar, the same guitar Kurt Cobain played. Now think about how you could get your clients in front of search results based on their relationships to your prospective searchers needs. Yup – exciting stuff.

Google is just getting started

An entity is an entity. Do this for your clients as well. The entries in Freebase ask for a lot of information that could very well influence your content production for the next year. Make your content and relationships on the web match your entries. At Matt Cutts’ keynote at Pubcon, he mentioned how they’re just scratching the surface on authorship. But I think authorship is just scratching the surface on semantic search. I think the big picture won’t manifest for another few years – but, no time like the present to start optimizing for relationships. At Greenlane we’re pushing all our chips in on some huge changes this year, and trying to get our clients positioned ASAP.

On a side note, I have a pretty interesting test brewing with entities, so watch this spot.

Step-By-Step Google Disavow Process (Or, How To Disavow Fast, Efficiently, and Successfully)

PFor one reason or another, plenty of sites are in the doghouse. The dust has settled a bit. Google has gotten more specific about the penalties and warnings through their notifications, and much of the confusion is no longer… as confusing. We’re now in the aftermath – the grass is slowly growing again and the sky is starting to clear. A lot of companies that sold black hat link building work have vanished (and seem to have their phone off the hook). Some companies who sold black hat work are now even charging to remove the links they built for you (we know who you are!). But at the end of the day, if you were snared by Google for willingly – or maybe unknowingly – creating “unnatural links,” the only thing to do is get yourself out of the doghouse.

Occasionally we have clients that need help. While it’s not our bread and butter, I have figured out a pretty solid, quick, and accurate method when I do need to pry a website out of the penalty box. It requires some paid tools, diligence, a bit of excel, and patience, but can be done in a few hours.

The tools I use (in order of execution):

To get the most out of these tools, you do need to pay the subscription costs. They are all powerful tools. They are all worth the money. For those who are not SEOs, reading this post for some clarity, let me explain:

To truly be accurate about your “bad links,” you need to get as big a picture of all the links coming to your site. Google Webmaster Tools will give you a bunch for free. But, in typical Google fashion, they never give you everything they know about in a report. Hell – even their Google Analytics is interpolated. So, to fill in the gaps, there are three big vendors: Open Site Explorer by Moz, Majestic SEO, and Ahrefs.

Wait – so why isn’t Ahrefs and Majestic SEO on my numbered list above? Because Cognitive SEO uses them in their tool. Keep reading…

Note: Click any of the screenshots below to get a larger, more detailed image.

Step 1 – Gather The Data

1. Download the links from Google Webmaster Tools.

Click Search Traffic > Links To Your Site > More > Download More Sample Links.   Choose a CSV format.

Google Webmaster Tools Links - Step 1

Google Webmaster Tools Links - Step 2

Don’t mess with this template. Leave it as is. You’re going to want to upload this format later, so don’t add headers or columns.

     2. Download all individual links from Open Site Explorer to a spreadsheet.

     3. Copy only the links out of OSE, and paste under your Webmaster Tools export.

     4. Remove any duplicate URLs.

At this point you should have a tidy list of each URL from Google Webmaster Tools and Open Site Explorer. Only one column of links. Next, we head over to Cognitive SEO.

Step 2 – Cognitive SEO Unnatural Link Detection

There are a number of SaaS tools out there to help you find, classify URLs, and create disavow lists. I’ve heard great things about Sha Menz’s rmoov tool, There’s also SEO Gadget’s link categorization tool (everything they build is solid in my book). I once tried Remove’em with OK results. Recently Cognitive SEO entered the space with their Unnatural Link Detection tool. With a little bit of input by you, it has its own secret sauce algorithm. I found the system to be quite accurate in most cases, classifying links into three buckets: OK, suspect, and unnatural. More info on the AI here. Also, if you read my blog regularly, you might remember my positive review of  their Visual Link Explorer.

First you tell Cognitive what your brand keywords are. Second, you tell it what the commercial keywords are. Typically, when doing disavow work for a client, they know what keywords they targeted. They know they were doing link building against Google guidelines, and know exactly what keywords they were trying to rank for. If the client is shy and doesn’t want to own up to the keywords – or honestly has no idea – there’s a tag cloud behind the form to help you locate the targeted keywords. The bigger the word, the more it was used in anchor text; thus, is probably a word Google spanked them over.

A note about the links Cognitive provides: Ravzan from Cognitive tells me the back link data is aggregated from MajesticSEO, Ahrefs, Blekko and SEOkicks mainly. That’s a lot of data alone!

Below I’ve used Greenlane as an example. Other than some directory submissions I did years ago, unnatural link building wasn’t an approach I took. But, looking at my keyword cloud, there are some commercial terms that I want to enter just to see what Cognitive thinks. Note, the more you fill in here, the better the results. The system can best classify when at least 70% of anchor text is classified as brand or commercial.

Cognitive SEO screenshot 1

Click submit, and Cognitive quickly produces what it thinks are natural and unnatural links.

Cognitive SEO Screenshot 2

Cognitive produces nice snapshot metrics. I can quickly see what links I need to review (if any). In my case, Cogntive marked the directory work I did as suspect. Since I don’t have a manual or algo penalty, I’m not going to worry about this work I did when I was younger, dumber SEO.

But, for a client who has a high percentage of bad links, this is super helpful. Here’s an example of results from a current client:

Cognitive SEO Screenshot 3

This site has a highly unnatural link profile and it’s likely to be already penalized by Google.  This happens to be an all too true statement.

Next, Cognitive added a layer of usability by extending with the Unnatural Links Navigator.

 Cognitive SEO screenshot 4

This tool basically creates a viewer to quickly toggle through all your links, and quickly (with some defined hotkeys) tap a site as “disavow domain” or “disavow link”. You get to look at each site quickly and make a judgement call on whether you want to agree with Cognitive’s default classification, or disagree. 9 times out of 10 I agree with what Cognitive thinks. Once in a while I would see a URL labeled “OK” where it really wasn’t. I would simply mark it to disavow.

What should you remove? Here’s a page with great examples from Google. Ultimately though this is your call. I recommend to clients we do the more conservative disavow first, then move to a more liberal if the first one fails. Typically I remove things that link look like they belong on the previously linked page. I also remove pages with spun content, forum board spam, xrumer and DFB stuff, obvious comment spam, resource page spam, and completely irrelevant links (like a viagara link on a page about law). PR spam, directories, and those sites that scrape and repost your content and server info have been around forever – currently I see no penalty from these kinds of links, but if my conservative disavow doesn’t do the job, then my second run will be more liberal, and contain these. 9 times out of 10 my conservative disavow is accepted.

This part of the process might take a couple hours depending on how many links you need to go through, but this is obviously much faster than loading each link automatically, and a lot more thorough than not loading any links at all. I believe if you’re not checking each link out manually, you’re doing it wrong. So turn on some music or a great TV show, grab a beer, tilt your chair back, and start disavowing.

Once complete, you’ll have the option to export a disavow spreadsheet and a ready-made disavow .txt file for Google.

Here’s are the full steps to make the most out of Cognitive SEO.

  1. Create a campaign with your site or client’s site.
  2. Once in the inbound link analysis view, click IMPORT on the top right. Choose Google Webmaster Tools as the Import Format. Choose the Google Webmaster Tools / Open Site Explorer .csv file. Click Import.
  3. Once links are appended, click “start the automatic classification now” and follow the steps.
  4. Click “Launch The Unnatural Links Navigator”, and click the “link” column to sort alphabetically.
  5. Toggle on each link to disavow individually, or choose one link per domain and disavow the domain. This will make sense once you’re in the tool.

Step 3 – Submit Disavow To Google – or – Do Outreach To Remove Links

Google wants you to make an effort and reach out to the sites to try and get the link removed. Painful? You bet. But some SEOs swear it doesn’t need to be done (exclaiming that a simple disavow is enough).

To disavow, take the .txt file you exported from Cognitive, and add any notes you’d like for Google. Submit through your Google Webmaster Tools at

But, if you want to attempt to get the links removed, Buzzstream can help you! Buzzstream is like a CRM and link manager tool for inbound marketers. Easily in my top 3 SEO tools. For prospecting, one (of several) things Buzzstream can do is scan a site and pull contact information. From an email that appears deep in the site, to a contact form, Buzzstream can often locate it.

By creating an account with Buzzstream, you can upload your spreadsheet of links into it, forcing Buzzstream to try to pull contact information. Choose “match my CSV” in the upload, and tell Buzzstream your column of links should be categorized as “linking from.”

Here’s a sample. Notice the email, phone, and social icons? This is a huge help in contacting these webmasters and asking for the link to be removed.

buzzstream screenshot


That’s all there is to it. For anyone who has done disavows in the past, and found it excruciating (as I used to), this will hopefully give you some tips to speed up the process. Of course, if you’re not in the mood to do any of this yourself, there are certainly SEO companies happy to do this work for you.

Any questions with these steps? Email me at or leave a comment below.