Duplicate content: Myths, causes & Solution

So, consider you wrote a damn good article with a lot of hard work but when it comes to public reach your blog did not do well. This is because the same text is being displayed at other URLs also. Many search engines like Google face issues related to duplicate content.

What is duplicate content?

Duplicate content is content that appears in many links on the web. If the same content is showed up on many URLs then the search engine won’t be able to prioritize the search results. Hence, both the URLs have to bear the loss of the audience because preference will be given to other web pages.

Here, we’ll mainly focus on the myths, technical causes, and solutions related to duplicate content. Mostly, the practice of duplicate content contradicts the concepts of digital marketing.

Illustration with an example

Duplicate content can be referred to as two signboards on a crossroad pointing towards the same destination. So, which road to take? As a user people don’t focus on original content, they just want information for their use but a search engine has to prioritize the web pages based on the searched keywords because, of course, no user wants to see the same content twice.

Let’s consider an article with ‘keyword A’ appears on http://www.demo.in/keyword-A/ and the same content also appears on http://www.demo.in/article-types/keyword-A/. This is not a fictitious situation, it happens in most of the content management systems. Then let’s say lots of content writers used your content and some of them link to the first URL, while others link to the second. Now, this is where the search engine’s real nature shows up. The duplicate content is your problem because those links both promote different URLs. If they were all linking to the same URL, your chances of ranking would be higher for ‘keyword A’.

If you want to know whether your writings are suffering due to duplicate content issues, then you can use different duplicate content discovery tools!

Causes of duplicate content

There are many reasons and causes for duplicate content. Most of them are technical: putting the same content in two different places without making clear which is the original is not very often. Unless you’ve copied a post and published it by accident. But otherwise, it feels unnatural to us.

This happens because the developer doesn’t think like an end-user and a search engine, they think technically like a programmer and a web developer. Tanking the above example of ‘keyword A’, a web developer will say it’s nothing different or only exists once.

Understanding the concept of URL

I know the developer sounds nonsense but he isn’t wrong. Yes, he is speaking in technical terms. A CMS will probably power a website by giving a unique ID to different articles in the database but the website algorithm allows the same article to be retrieved from the database by using different URLs. That’s because according to a developer the unique identifier for the particular article is the ID in the database but according to the browser URL is the unique identifier and this can slightly confuse the developer. After reading this article you can suggest a solution website developers.

Session IDs

First of all, let’s discuss this confusing and technical term Session IDs. When we browse on some websites, each web page records our activities, and to do that you have to give them sessions. The unique identifier of that session is known as ‘Session IDs’- needs to be stored somewhere. The most common solution is to store sessions using cookies. However, search engines don’t store them.

At this particular point, some systems start using URLs to store session IDs and hence generating new URLs every time which results in duplicate content.

Tracking and sorting parameters

Using URL parameters that do not change the content of a page is another main reason for duplicate content, for instance in tracking links. You see, to a search engine, http://www.demo.in/keyword-A/ and http://www.demo.in/keyword-A/?source=rss are not the same URLs. The latter might allow you to track what source people came from, but it might also make it harder for the search engine to rank well – very much an unwanted side effect!

This is not only the case with tracking parameters, but it also goes with every parameter you can add to URLs that do not change the content.

Order of parameters

Not using clean URLs is another issue related to CMS, rather using URLs like /?id=1&cat=2, where id refers to article and cat refers to the category. This will generate the same results in many of the web systems, but they are different for search engines.

HTTP vs. HTTPS or WWW vs. non-WWW pages

If the search results produce different results for “site.com” and “www.site.com” (i.e. with and without www prefix), congrats you have successfully created duplicate web pages. The same applies to the sites using HTTP//: and HTTPS//: services.

URLs are case sensitive

Google sees URLs as case-sensitive.

That means these three URLs are all different:

example.com/page
https://example.com/PAGE
http://example.com/pAgE

However, this doesn’t seem to be the same for Bing, it treats all URLs as lowercase.

Solutions for avoiding duplicate content

301 Redirecting

This is the most common and generalized solution for duplicate content avoidance. Sometimes it’s better to redirect different duplicate content pages to the original one(canonical URLs) because most of the time it becomes impossible to avoid all the duplicate URLs.

Canonical URLs( Conceptual Solution)

Now we know that URLs affect duplicate content SEO and it can be solved. If you ask a person for ‘correct’ URL for an article, he’ll tell you that but if asked from 3 different people in the organization then you’ll get 3 different answers.

Keep eyes on similar content.

Duplicate content is not only the content that is copied word-to-word from some other source. In fact, Google itself defines duplicate content as the content that matches other content or appreciably similar.

Using the original link as SEO in <head>

Sometimes you can’t get rid of the duplicate content or duplicate version of an article even when you know it’s duplicate. To get rid of this problem, the search engine provides the canonical link element. It’s present in the <head > section of the website and looks like this-

When a search engine finds this link it performs soft 301 redirects and takes you to the original content.

Linking back to the original content

If you can’t perform the above solution that’s because you don’t have control over the head section of the site, adding the link to the original article before or after the article is always a good idea. If google encounters different links pointing to the same article, it will automatically figure out the actual canonical form.

Myths regarding duplicate content

There are few misconceptions in the duplicate content SEO community:

‘Same text on multiple pages’ is duplicate content.

Some of the community people think that the same text is present on different pages is duplicate content but, actually when the same content is accessible from different URLs results in duplicate content(which may happen because of the causes discussed above).

Blocking ‘crawlers’ from duplicate web pages

Why shouldn’t we close duplicate URLs from getting index by using robot.txt?

Even thou this will save many additional computing resources( especially for big e-commerce websites which have a large amount of filtering data) but this now what Google recommend:

“Google does not recommend blocking crawler access to duplicate content on your website, whether with a robots.txt file or other methods. If search engines can’t crawl pages with duplicate content, they can’t automatically detect that these URLs point to the same content and will, therefore, effectively have to treat them as separate, unique pages.

A better solution is to allow search engines to crawl these URLs, but mark them as duplicates by using the rel=”canonical” link element, the URL parameter handling tool, or 301 redirects.”

The best practice is to use rel=”canonical” to specify the URL.

How much duplicate content is acceptable?

Duplicate content, for the SEO community, is definitely not a good practice. Until not affecting the website analytics for user experience is acceptable(but needs to be removed as much as possible). It spoils the integrity of the original content.

Google can identify the original content

There’s been loads of dialogue on the net concerning Google being or not having the ability to inform the first creator of a content piece.

Some folks would say Google replies on publication date to trace the authentic author, however, multiple instances of hijacked search results (a hand tool web site outranking the original) negate that.

Dan Petrovic of Dejan SEO once controls many convincing experiments, during which it had been established that, once the hand tool page has higher PR than the first page, it’s doubtless to rank the authentic page in search.

Plus, there are several alternative grey-area scenarios once Google is hesitant concerning that version of the page to show in search results.

So, in line with Dan Petrovic, there are sure signals you’ll be able to send Google to let it understand you are the original author. These are:

Claim your Google Authorship
Specify canonical URLs
Share a freshly revealed piece on Google+, etc.

Key Takeaways And Conclusion

Duplicate content is everywhere. But it is not something that can’t be solved, you just only need to keep an eye on but it is fixable and can reward plentiful. You just need to deliver quality content by taking precautions and your website will soar in the rankings.

Author

Jaydeep Patel

I Start My SEO Journey Since 2014.

LATEST NEWS

Mastering SEO for Herbal Products: A 2024 Guide to Organic Growth

eBook Development: From Concept to Digital Success

CONTACTS

Duplicate content

What is duplicate content?

Illustration with an example