There are several reasons you want to keep something out of the index. It’s a timed promotion page, it’s a PPC or other marketing page not meant for search, it’s a processor page (like a cart) that has no user value. It could be a page that is ranking but is doing more harm (as noise) than good. Perhaps it’s duplicate content.
Several of my clients through the years have heard that a robots.txt file is the solution. Unfortunately, it’s not. By putting a page (or advanced string) into the robots.txt file, you’re only telling engines not to crawl it. If they already know about it and have it indexed, they may still rank it. They just won’t update the ranking. Same story if you have links pointing to it – they may rank a page with a note about the robots.txt, or might just put a thin listing (ie, they make up their own title text and no snippet).
There are two efficient ways to clear the index of your offending page:
The first is to make sure the page is blocked via robots.txt or 404’ed, THEN to go to Google Webmaster’s URL remover tool. Enter in the URL. If the page is blocked or removed, Google will honor this request in a few days. More info on 404’s can be found here.
The second is the meta robots tag. By putting <META NAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW”> into the <head> of your website, engines will remove this particular document. You do not need to have the document blocked by robots.txt for this. For more information, check out this link.
To see if your website has pages indexed that are wasting search engines’ time, go to each engine and type:
(change the www domain to your domain)