Robots exclusion question.

Hampstead posted this at 09:50 — 26th March 2007.

Joined: Nov 2006

Hi,

I have many hundreds of pages created by a database in the following format:

domain.com/availability.asp?id=26&theyear=2007&themonth=9
domain.com/availability.asp?id=26&theyear=2007&themonth=10
domain.com/availability.asp?id=26&theyear=2007&themonth=11

and so on.

I don't want these indexed as they are duplicates of each other.

What is the best way?

Should I ad the robots noindex/nofollow meta to each page? Or, if I exclude /availability.asp in the robots.txt, will that also dissalow all of the files woth the ?id= part too?

Jacine posted this at 17:17 — 26th March 2007.

They have: 27 posts

Joined: Mar 2007

Hi Hampstead,

You can actually single out certain parameters for the Googlebot. Adding these lines to your to your robots.txt file tells the Googlebot not to index any URL's with "theyear" and "themonth" parameters:

User-agent: Googlebot
Disallow: /*theyear=
Disallow: /*themonth=

If you were to exclude /availability.asp in your robots.txt file, it would also exclude the URL's with the "id" parameter.

More info on the Googlebot wildcard here: Google Robots.txt Wildcard

Hampstead posted this at 09:31 — 27th March 2007.

They have: 16 posts

Joined: Nov 2006

Thanks for this. I do not want any URLs with the id parameter indexed anyway so perhaps excluding /availability.asp would be the way forward?

Jacine posted this at 15:30 — 27th March 2007.

They have: 27 posts

Joined: Mar 2007

You're welcome.

And yes, if you don't want the "id" pages indexed either, blocking the whole page is the way to go.