How Did I Beat The Duplicate Content Game?
(There have been a few updates to this article at the end; the title of this article has been changed to reflect all the data. I highly recommend you read the comments as well).
Yesterday I posted an article on quick link wins from Moz’s new Fresh Web Index. I happened to catch the announcement of the tool and tested it immediately. I wrote up a quick post about an hour later. There were comments from Twitter, inbound.org, and my own blog about how fast I produced the article.
Unfortunately, my domain didn’t make the first page. But two sites who republished my article did. My post was the canonical version – Google is supposed to figure that out, right? Especially since my page was indexed before the other two. Let’s look at this deeper.
I get republished by Business 2 Community. They hand-pick posts from my feed that might suit their members. Yahoo is a publishing partner of B2C, so they again publish some of B2C’s posts. If you look at the image above, both those domains are ranking for my article. Authorship didn’t help me here (not that I expected it to), and the links back to my site didn’t clue Google in. Nor is there a canonical tag in place by B2C or Yahoo. From the looks of it, I appear to be beaten by sheer domain authority. Not only that, I appear to have been completely filtered out of the first 100 ranks.
To me, this is Google doing a poor job.
So it got me thinking – what else can I do to signal to Google that my original post should be shown in place of one of these re-publishers? I could ask B2C to remove my posts, citing duplicate content issues, but I like the visibility I get there (and on Yahoo).
The Long Shot
If you look at my single post pages, my template actually removes the time stamp. It has the date, but not the actual hour the post went live. Could that be the magic bullet to get Google to value my original post higher?
As of 10:20am (of day 2), I have coded the time stamp into my WordPress single-post template. Again, I think this is a long shot. Because it’s easily faked, would Google actually factor that?
Now we wait to see if Google actually pays attention to the posted time. I’m also going to “fetch as google” and submit to the index again, since some think that might work as an old-school ping. Can’t hurt.
Success. Google decided to list me on the first page today (a fresh cache is listed for today, March 8th), right under a great post that came out by Rhea at Outspoken Media. The Yahoo listing still exists, but the blended News listing (Business 2 Community) has dropped.
So other than adding the time stamp (my long shot), what changed?
Well, let’s check FWE to start. According to the tool, I got two new linking root domains (aside from the Yahoo and B2C) link. One is from the result right above me, the strong Outspoken Media. Clearly as I sing FWE’s praises, I know it can’t catch all the links out there. There may be more. Additionally, Yahoo and B2C probably received links too (at this time, it’s still too soon to see in OSE, Majestic or ahrefs).
Second, since the news vertical dropped off, it could have specifically been my barrier to entry. While that algorithm runs differently to Google’s general search algorithm, I could understand where an IFTTT type of scenario occurred. By rule, possibly Google says, “if three of the same post appears on a page, then kill the least authoritative.” If the freshness of the news vertical times out, maybe my site is granted it’s appropriate return. This still doesn’t speak highly of Google’s internal canonicalization abilities.
So What’s My Best Guess?
Correlation doesn’t equal causation, so I have to go with my gut until I can get more information. Currently I suspect the answer lies in one of the above three explanations.
I’m publishing this post now, but expect to come back to it as I think a little more through it. Would love to see your thoughts in the comments!
Update 3/28/2013: Well, it’s been about a month, and my page no longer ranks for the term. The Yahoo duplicate content listing still does (on the first page as of this writing). It looks like the QDF and any internal canonicalization Google may do has worn off. Some of the web pages now dominating are strong, unique pieces. Some are low quality.
Quite disappointing. FAIL… and updated the title of this post accordingly
At the very least, hopefully this post is useful for someone in the same situation to understand more about how Google is currently processing through this issue. I urge you to read the comments, as more information is contained there.
Update 11/17/2013: Much time has passed. I’ve been noticing that duplicate content issues have seemed less and less dangerous for some of my clients. In the past couple months I saw Google start getting it right for two clients in particular, who struggled with some of the same issues I noted above.
I remembered this post and decided to do the query again. Now the duplicate pages are completely out of the index, and my URL is the first (and only ranking) piece. It came back. I’m quite pleased, actually.
It looks like Google may have gotten its act together a bit more in the recent months.
Once again, I updated the title of this post accordingly