Importance: High
Error «Duplicate Pages» Description
Indicates duplicate compliant pages by all HTML code of the page. URLs in this report are grouped by the 'Page Hash' parameter.
The Importance of the Problem
Duplicate pages occur when the same page is available by different addresses. For instance, addresses with/without www, ones with different protocols (http/https), with/without the '/' symbol at the end, etc.
It's difficult for search engines to determine which address among duplicates to index and show in search results. As a result, less important pages may appear higher in search results. It may lead to low rankings of important pages, traffic loss, and even removal of these pages from search results.
Big sites may particularly suffer from duplicates: search robots might waste all crawling resources on them and there will be nothing left for important pages. As a result, many important pages may not get into the search index, and the site will lose traffic. If there are lots of duplicate pages, search engines might lower rankings of the whole site (for instance, this is how the Google Panda algorithm works).
How to Fix Error
Define the main URL among duplicates and set the 301 redirect to this URL. For useless URLs (e.g. /index.php and /index.html) it's also OK to set the 404 or 410 status code. This being said, remember not to use links to redirects and unavailable pages on a site.
If duplicates cannot be fixed using previous methods or these URLs have to be on a site (for instance, addresses with parameters for web analytics), specify the main URL for them using the <link rel="canonical"> tag or the 'Link: rel="canonical"' HTTP response header.
|