Robots exclusion question.
Hi,
I have many hundreds of pages created by a database in the following format:
domain.com/availability.asp?id=26&theyear=2007&themonth=9
domain.com/availability.asp?id=26&theyear=2007&themonth=10
domain.com/availability.asp?id=26&theyear=2007&themonth=11
and so on.
I don't want these indexed as they are duplicates of each other.
What is the best way?
Should I ad the robots noindex/nofollow meta to each page? Or, if I exclude /availability.asp in the robots.txt, will that also dissalow all of the files woth the ?id= part too?
Jacine posted this at 17:17 — 26th March 2007.
They have: 27 posts
Joined: Mar 2007
Hi Hampstead,
You can actually single out certain parameters for the Googlebot. Adding these lines to your to your robots.txt file tells the Googlebot not to index any URL's with "theyear" and "themonth" parameters:
User-agent: Googlebot
Disallow: /*theyear=
Disallow: /*themonth=
If you were to exclude /availability.asp in your robots.txt file, it would also exclude the URL's with the "id" parameter.
More info on the Googlebot wildcard here: Google Robots.txt Wildcard
Hampstead posted this at 09:31 — 27th March 2007.
They have: 16 posts
Joined: Nov 2006
Thanks for this. I do not want any URLs with the id parameter indexed anyway so perhaps excluding /availability.asp would be the way forward?
Jacine posted this at 15:30 — 27th March 2007.
They have: 27 posts
Joined: Mar 2007
You're welcome.
And yes, if you don't want the "id" pages indexed either, blocking the whole page is the way to go.
Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.