Does a robots.txt disallow instruct search engines to deindex pages?
No, search engines will not automatically stop indexing pages that are made inaccessible to them. If search engines have reason to believe the page is important, they'll keep it indexed.
Take a look at this example:
While Google isn't allowed to access this page, it is still indexed.
Every time you see the description "A description for this result is not available because of this site's robots.txt" someone has prevented search engines from accessing pages using a disallow
, hoping search engines will drop the page from their index.
How to handle deindexing the right way
If you want search engines to stop indexing certain pages: the first step here is to communicate this clearly.
Implementing the <meta name="robots" content="noindex"> attribute on pages you want deindexed is the most efficient way to do this. Make sure the noindex
attribute is picked up by search engines, and only when the pages have disappeared from search engines' indices add the disallow rules to your robots.txt file. Alternatively you can also use the X-Robots-Tag HTTP Header.
If you don't give search engines the chance to even notice and process the noindex
attribute, they'll keep pages in their index for quite a long time. There's no rule of thumb for how long, as that depends on how important they deem the page.