Find and Fix Index Coverage Errors in Google Search Console
Google Search Console's Index Coverage report provides feedback on your site’s crawling and indexing process.
The reported issues break down in four statuses:
- Valid
- Valid with warnings
- Error
- Excluded
Each status consists of different issue types that zoom in on specific issues Google has found on your site.
As you know, Google Search ConsoleGoogle Search Console
The Google Search Console is a free web analysis tool offered by Google.
Learn more is an essential part of every SEO’s toolbox.
Among other things, Google Search Console reports on your organic performance and how they fared when crawling and indexing your site. The latter topic is covered in their 'Index Coverage report', which this article is all about.
After reading this article, you’ll have a firm understanding of how to leverage the Index Coverage report to improve your SEO performance.
Before we dig in, here’s a brief primer on discovering, crawling, indexing, and ranking:
- Discovering: in order to crawl a URL, search engines first need to discover it. There are various ways they can do so, such as: following links from other pages (both on-site and off-site) and processing XML sitemaps. URLs that have been discovered are then queued for crawling.
- Crawling: during the crawling phase, search engines request URLs and gather information from them. After an URL is received, it’s handed over to the Indexer, which handles the Indexing process.
- Indexing: during indexing, search engines try to make sense of the information produced by the crawling phase. To put it somewhat simply, during indexing a URL’s authority and relevance for queries is determined.
When URLs are indexed, they can appear in search engineSearch Engine
A search engine is a website through which users can search internet content.
Learn more result pages (SERPs).
Let this all sink in for a minute.
This means your pages can only appear in the SERPs if your pages made it successfully through the last phase — indexing.
What is the Google Search Console Index Coverage report?
When Google is crawling and indexing your site, they keep track of the results and report them in Google Search Console’s Index Coverage report .
It’s basically feedback on the more technical details of your site’s crawling and indexing process. In case they detect pressing issue, they send notifications. These notifications are usually delayed though, so don't solely rely on these notifications to learn about high-impact SEO issues.
Google's feedback is categorized in four statuses:
- Valid
- Valid with warnings
- Excluded
- Error
When should you use the Index Coverage report?
Google says that if your site has fewer than 500 pages, you probably don’t need to use the Index Coverage report. For sites like this, they recommend using their site:
operator.
We strongly disagree with this.
If organic traffic from Google is essential to your business, you do need to use their Index Coverage report, because it provides detailed information and is much more reliable than using their site:
operator to debug indexing issues.
The Index Coverage report explained
The screenshot above is from a fairly large site with lots of interesting technical challenges.
Find your own Index Coverage report by following these steps:
- Log on to Google Search Console.
- Choose a property.
- Click
Coverage
underIndex
in the left navigation.
The Index Coverage report distinguishes among four status categories:
- Valid: pages that have been indexed.
- Valid with warnings: pages that have been indexed, but which contain some issues you may want to look at.
- Excluded: pages that weren’t indexed because search engines picked up clear signals they shouldn’t index them.
- Error: pages that couldn’t be indexed for some reason.
Each status consists of one or more types. Below, we’ll explain what each type means, whether action is required, and if so, what to do.
Valid URLs
As mentioned above, ”valid URLs” are pages that have been indexed. The following two types fall within the “Valid” status:
- Submitted and indexed
- Indexed, not submitted in sitemap
Submitted and indexed
These URLs were submitted through an XML sitemap and subsequently indexed.
Action required: none.
Indexed, not submitted in sitemap
These URLs were not submitted through an XML sitemap, but Google found and indexed them anyway.
Action required: verify if these URLs need to be indexed, and if so add them to your XML sitemap. If not, make sure you implement the robots noindex directive and optionally exclude them in your robots.txt if they can cause crawl budget issues.
If you have an XML sitemap, but you simply haven’t submitted it to Google Search Console, all URLs will be reported with the type: “Indexed, not submitted in sitemap” – which is a bit confusing.
It makes sense to split the XML sitemap into smaller ones for large sites (say 10,000+ pages), as this helps you quickly gain insight in any indexability issues per section or content type.
Valid URLs with warnings
The “Valid with warnings” status only contains two types:
- “Indexed, though blocked by robots.txt”
- "Indexed without content"
Indexed, though blocked by robots.txt
Google has indexed these URLs, but they were blocked by your robots.txt file. Normally, Google wouldn’t have indexed these URLs, but apparently they found links to these URLs and thus went ahead and indexed them anyway. It’s likely that the snippets that are shown are suboptimal.
Please note that this overview also contains URLs that were submitted through XML sitemaps since January 2021 .
Action required: review these URLs, update your robots.txt, and possibly apply robots noindex directives.
Indexed without content
Google has indexed these URLs, but Google couldn't find any content on them. Possible reasons for this could be:
- Cloaking
- Google couldn't render the page, because they were blocked and received a HTTP status code 403 for example.
- The content is in a format Google doesn't index
- An empty page was published.
Action required: review these URLs to double-check whether they really don't contain content. Use both your browser, and Google Search Console's URL Inspection Tool to determine what Google sees when requesting these URLs. If everything looks fine, just request reindexing.
Excluded URLs
The “Excluded” status contains the following types:
- Alternate page with proper canonical tag
- Blocked by page removal tool
- Blocked by robots.txt
- Blocked due to access forbidden (403)
- Blocked due to other 4xx issue
- Blocked due to unauthorized request (401)
- Crawl anomaly
- Crawled - currently not indexed
- Discovered - currently not indexed
- Duplicate without user-selected canonical
- Duplicate, Google chose different canonical than user
- Duplicate, submitted URL not selected as canonical
- Excluded by ‘noindex’ tag
- Not found (404)
- Page removed because of legal complaint
- Page with redirect
- Soft 404
Alternate page with proper canonical tag
These URLs are duplicates of other URLs, and are properly canonicalized to the preferred version of the URL.
Action required: if these pages shouldn't be canonicalized, change the canonical to make it self-referencing. Additionally, keep an eye on the amount of pages listed here. If you're seeing a big increase while your site hasn't increased that much in indexable pages, you could potentially be dealing with poor internal link structure and/or crawl budget issue.
Blocked by page removal tool
These URLs are currently not shown in Google’s search results because of a URL removal request. When URLs are hidden in this way, they are hidden from Google’s search results for 90 days. After that period, Google may bring these URLs back up to the surface.
The URL removal request feature should only be used as a quick, temporary measure to hide URLs. We always recommend taking additional measures to truly prevent these URLs from popping up again.
Action required: send Google a clear signal that they shouldn’t index these URLs via the robots noindex directive and make sure that these URLs are recrawled before the 90 days expire.
Blocked by robots.txt
These URLs are blocked because of the site’s robots.txt file and are not indexed by Google. This means Google has not found signals strong enough to warrant indexing these URLs. If they had, the URLs would be listed under “Indexed, though blocked by robots.txt”.
Action required: make sure there aren’t any important URLs among the ones listed in this overview.
Blocked due to access forbidden (403)
Google wasn't allowed to access these URLs and received a 403 HTTP response code.
Action required: make sure that Google (and other search engines) have unrestricted access to URLs you want to rank with. If URLs that you don't want to rank with are listed under this issue type, then it's best to just apply the noindex directive (either in the HTML source or HTTP header).
Blocked due to other 4xx issue
Google couldn't access these URLs because they received 4xx response codes other than the 401, 403 and 404. This can happen with malformed URLs for example, these sometimes return the 400 response code.
Action required: try fetching these URLs using the URL inspection tool to see if you can replicate this behavior. If these URLs are important to you, investigate what’s going on, fix the issue and add the URLs to your XML sitemap. If you don't want to rank with these URLs, then just make sure you remove any references to them.
Blocked due to unauthorized request (401)
These URLs are inaccessible to Google because upon requesting them, Google received a 401 HTTP response, meaning they weren’t authorized to access the URLs. You’ll typically see this for staging environments, which are made inaccessible to the world using HTTP Authentication.
Action required: make sure there aren’t any important URLs among the ones listed in this overview. If there are, you need to investigate why, because that would be a serious SEO issue. If your staging environment is listed, investigate how Google found it, and remove any references to it. Remember, both internal and external links can be the cause of this. If search engines can find those, it's likely visitors can as well.
Crawl anomaly
🛎️ Crawl anomaly type has been retired
With January 2021's Index Coverage update , the crawl anomaly issue type has been retired. Instead, you'll now find the more specific issue types:
These URLs weren’t indexed because Google encountered a “crawl anomaly” when requesting them. Crawl anomalies can mean they received response codes in the 4xx and 5xx range that aren’t listed with their own types in the Index Coverage report.
Action required: try fetching some URLs using the URL inspection tool to see if you can replicate the issue. If you can, investigate what’s going on. If you can’t find any issues and everything works fine, keep an eye on it, as it can just be a temporary issue.
Crawled - currently not indexed
These URLs were crawled by Google, but haven’t been indexed (yet). Possible reasons why a URL may have this type:
- The URL was recently crawled, and is still due to be indexed.
- Google knows about the URL, but hasn’t found it important enough to index it. For instance because it has few to no internal linksInternal links
Hyperlinks that link to subpages within a domain are described as "internal links". With internal links the linking power of the homepage can be better distributed across directories. Also, search engines and users can find content more easily.
Learn more, duplicate contentDuplicate Content
Duplicate content refers to several websites with the same or very similar content.
Learn more or thin content.
Action required: make sure there aren’t important URLs among the ones in this overview. If you do find important URLs, check when they were crawled. If it’s very recent, and you know this URL has enough internal links to be indexed, it’s likely that will happen soon.
Discovered - currently not indexed
These URLs were found by Google but haven’t been crawled—and therefore indexed—yet. Google knows about them, and they’re queued for crawling. This can be because Google has requested these URLs and wasn’t successful because the site was overloaded, or because they simply didn’t get to crawling them yet.
Action required: keep an eye on this. If the number of URLs increases, you might be having crawl budget issues: your site is demanding more attention than Google wants to spend on it. This can be because your site doesn’t have enough authority or is too slow or often unavailable.
Duplicate without user-selected canonical
These URLs are duplicates according to Google. They aren’t canonicalized to the preferred version of the URL, and Google thinks these URLs aren’t the preferred versions. Therefore, they’ve decided to exclude these URLs from their index.
Oftentimes, you’ll find PDF files that are 100% duplicates of other PDFs among these URLs.
Action required: add canonical URLs to the preferred versions of the URLs such as for example a product detail page. If these URLs shouldn’t be indexed at all, make sure to apply the noindex directive through the meta robots tag or X-Robots-Tag HTTP Header. When you’re using the URL Inspection tool, Google may even show you the canonical version of the URL.
Duplicate, Google chose different canonical than user
Google found these URLs on its own and considers them duplicates. Even though you canonicalized them to your preferred URL, Google chooses to ignore that and apply a different canonical.
You’ll often find that Google selects different canonicals on multi-language sites with highly similar pages and thin content.
Action required: Use the URL inspection tool to learn which URL Google has selected as the preferred URL and see if that makes more sense. For instance, it’s possible Google has selected a different canonical because it has more links and/or more content.
Duplicate, submitted URL not selected as canonical
You’ve submitted these URLs through an XML sitemap, but they don’t have a canonical URL set. Google considers these URLs duplicates of other URLs, and has therefore chosen to canonicalize these URLs with Google-selected canonical URLs.
Please note that this type is very similar to type Duplicate, Google chose different canonical than user, but is different in two ways:
- You explicitly asked for Google to index these pages.
- You haven’t defined canonical URLs.
Action required: add proper canonical URLs that point to the preferred version of the URL.
When performing website migrations, it's a common best practice to keep the XML sitemap that contains the old URLs available to speed up the migration process. These old URLs will be listed under Duplicate, submitted URL not selected as canonical
as long as they are included in the XML sitemap. After removing them from the XML sitemap, the URLs will move to the Page with redirect
status.
Excluded by ‘noindex’ tag
These URLs haven’t been indexed by Google because of the noindex directive (either in the HTML source or HTTP header).
Action required: make sure there aren’t important URLs among the ones listed in this overview. If you do find important URLs, remove the noindex directive
, and use the URL Inspection tool to request indexing. Double-check whether there are any internal links pointing to these pages, as you not want these noindex'ed pages to be publicly available.
Please note that, if you want to make pages inaccessible, the best way to go about that is to implement HTTP authentication.
Not found (404)
These URLs weren’t included in an XML sitemap, but Google found them somehow and can’t index them because they returned a HTTP status code 404. It’s possible Google found these URLs through other sites, or that these URLs existed in the past.
Action required: make sure there aren’t important URLs among the ones listed in this overview. If you do find important URLs, restore the contents on these URLs or 301 redirect the URL to the most relevant alternative. If you don't redirect to a highly relevant alternative, this URL is likely to be seen as a soft 404.
Page removed because of legal complaint
These URLs were removed from Google’s index because of a legal complaint.
Action required: make sure that you’re aware of every URL that’s listed in this overview, as someone with malintent may have requested your URLs to be removed from Google’s index.
Page with redirect
These URLs are redirecting, and are therefore not indexed by Google.
Action required: none.
When you’re involved in a website migration this overview of redirecting pages does come in handy when creating a redirect plan.
Soft 404
These URLs are considered soft 404 responses, meaning that the URLs don’t return a HTTP status code 404 but the content gives the impression that it is in fact a 404 page, for instance by showing “Page can’t be found” message. Alternatively, these errors can be the result of redirects pointing to pages that are considered not relevant enough by Google. Take for example a product detail page that's been redirected to its category pages, or even to the home page.
Action required: if these URLs are real 404s, make sure they return a proper 404 HTTP status code. If they’re not 404s at all, then make sure the content reflects that.
Error URLs
The “Error” status contains the following types:
- Redirect error
- Server error (5xx)
- Submitted URL blocked by robots.txt
- Submitted URL blocked due to other 4xx issue
- Submitted URL has crawl issue
- Submitted URL marked ‘noindex’
- Submitted URL not found (404)
- Submitted URL seems to be a Soft 404
- Submitted URL returned 403
- Submitted URL returns unauthorized request (401)
Redirect error
These redirected URLs can’t be crawled because Google encountered redirect errors. Here are some examples of potential issues Google may have run into:
- Redirect loops
- Redirect chains that are too long (Google follows five redirects per crawl attempt)
- Redirect to a URL that’s too long
Action required: investigate what’s going on with these redirects and fix them. Here’s how to easily check your HTTP status codes so you can start debugging them.
Server error (5xx)
These URLs returned a 5xx error to Google, stopping Google from crawling this page.
Action required: investigate why the URL returned a 5xx error, and fix it. Oftentimes, you see that these 5xx errors are only temporary because the server was too busy. Keep in mind that the user-agent making the requests can influence what HTTP status code is returned, so make sure to use Googlebot’s user-agent.
Submitted URL blocked by robots.txt
You submitted these URLs through an XML sitemap, but they weren't indexed because Google’s blocked through the robots.txt file. This type is highly similar to two other types we’ve already covered above.
Here’s how this one is different:
- If the URLs would have been indexed, they would have been listed under “Indexed, though blocked by robots.txt”.
- If the URLs are indexed and not submitted through an XML sitemap, they’d be listed under type "Blocked by robots.txt".
These are subtle differences, but a big help when it comes to debugging issues like these.
Action required:
- If there are important URLs listed, make sure you prevent them from being blocked through the robots.txt file. Find the robots.txt directive by selecting a URL, and then clicking the
TEST ROBOTS.TXT BLOCKING
button on the right hand side. - URLs that shouldn’t be accessible to Google, should be removed from the XML sitemap.
Submitted URL blocked due to other 4xx issue
You submitted these URLs through an XML sitemap, but Google received 4xx response codes other than the 401, 403 and 404.
Action required: try fetching these URLs using the URL inspection tool to see if you can replicate the issue. If you can, investigate what’s going on and fix it. If these URLs aren't working properly, and shouldn't be indexed then remove them from the XML sitemap.
Submitted URL has crawl issue
You submitted these URLs through an XML sitemap, but Google encountered crawl issues. This “Submitted URL has crawl issue” type is the “catch all” for crawl issues that don’t fit in any of the other types.
Oftentimes, these crawl issues are temporary in nature and will receive a “regular” classification (such as for example “Not found (404)") upon re-checking them.
Action required: try fetching some URLs using the URL inspection tool to see if you can replicate the issue. If you can, investigate what’s going on. If you can’t find any issues and everything works fine, keep an eye on it, as it can just be a temporary issue.
Submitted URL marked ‘noindex’
You submitted these URLs through an XML sitemap, but they’ve got the noindex directive (either in the HTML source or HTTP header).
Action required:
- If there are important URLs listed, make sure to remove the noindex directive.
- URLs that shouldn’t be indexed should be removed from the XML sitemap.
Submitted URL not found (404)
You submitted these URLs through an XML sitemap, but it appears the URLs don’t exist.
This type is highly similar to the “Not found (404)” type we covered earlier, the only difference being that in this case, you submitted the URLs through the XML sitemap.
Action required:
- If you find important URLs listed, restore their contents or 301 redirect the URL to the most relevant alternative.
- Otherwise, remove these URLs from the XML sitemap.
Submitted URL seems to be a Soft 404
You submitted these URLs through an XML sitemap, but Google considers them “soft 404s”. These URLs may be returning a HTTP status code 200, while in fact displaying a 404 page, or the content on the page gives the impression that it’s a 404.
This type is highly similar to the Soft 404 type we covered earlier, the only difference being that in this case you submitted these URLs through the XML sitemap.
Action required:
- If these URLs are real 404s, make sure they return a proper 404 HTTP status code and are removed from the XML sitemap.
- If they’re not 404s at all, then make sure the content reflects that.
Submitted URL returned 403
You submitted these URLs through an XML sitemap, but Google wasn't allowed to access these URLs and received a 403 HTTP response.
This type is highly similar to the one below, the only difference being that in the case of a 401 HTTP response login credentials were expected to be entered.
Action required: if these URLs should be available to the public, provide unrestricted access. Otherwise, remove these URLs from the XML sitemap.
Submitted URL returns unauthorized request (401)
You submitted these URLs through an XML sitemap, but Google received a 401 HTTP response, meaning they weren’t authorized to access the URLs.
This is typically seen for staging environments which are inaccessible to the world by using HTTP Authentication.
This type is highly similar to the “Blocked due to unauthorized request (401)” type we covered earlier, the only difference being that in this case you submitted these URLs through the XML sitemap.
Action required: investigate whether the 401 HTTP status code was returned correctly. If that’s the case, then remove these URLs from the XML sitemap. If not, then allow Google access to these URLs.
Frequently asked questions about the Index Coverage report
What information does the Index Coverage report contain?
The Index Coverage report provides feedback from Google on how they fared when crawling and indexing your website. It contains valuable information that helps you improve your SEO performance.
When should you use the Index Coverage report?
While Google says that the Index Coverage report is only useful for sites with more than 500 pages, we recommend that anyone who heavily relies on organic traffic uses it. It provides so much detailed information and is much more reliable than using their site: operator to debug indexing issues, you don't want to miss out on this.
How often should I check the Index Coverage report?
That depends on what you’ve got going on at your website. If it’s a simple website with a few hundred pages, you may want to check it once a month. If you’ve got millions of pages and add thousands of pages on a weekly basis, we’d recommend checking the most important issue types once a week.
Why are so many of my pages listed with the “Excluded” status?
There are various reasons for this, but we often see that the majority of these URLs are canonicalized URLs, redirecting URLs and URLs that are blocked through the site’s robots.txt.
Especially for large sites, that adds up quickly.
- "Indexed, though blocked by robots.txt": what does it mean and how to fix?
- Submitted URL seems to be a Soft 404: what does it mean and how to fix?
- Submitted URL has crawl issue: what does it mean and how to fix?
- Duplicate without user-selected canonical: what does it mean and how to fix?
- Crawled - currently not indexed: what does it mean and how to fix it?
- Discovered - currently not indexed: what does it mean and how to fix?
- Indexed, not submitted in sitemap: what does it mean and how to fix?
- Submitted URL marked ‘noindex’: what does it mean and how to fix?
- Crawl anomaly: what does it mean and how to fix?
- Alternate page with proper canonical tag: what does it mean and how to fix it?
- New Index coverage issue detected for site: how to fix?
- Submitted URL not found (404): what does it mean and how to fix it?
- Duplicate, submitted URL not selected as canonical: what does it mean and how to fix it?
- Duplicate, Google chose different canonical than user: what does it mean and how to fix it?