How does Google index dynamic pages?
I have simple text links to hundeds of pages who's content is dynamically created. An example of one of these pages would be http://www.pstvalumni.com/directory/info.php?id=234 whereas the id number keeps changing. What are Google's methods of indexing pages with a question mark in the address?
When I do a site specific search on Google, some of these type pages DO show up but many other do not and I can't figure out why or how to get them all indexed. Best I can figure is that the ones that are listed were linked recently on my home page (because they were recetly updated), while those pages not found in Google (the vast majority) are linked from a page which itself has a "?" in the address. Is that the reason why this is happening? Does Google index text from pages with a "?" but not index pages linked from those pages?
What can I do to avoid this? I have one page which is "All". Right now it is dynamically created, but do you think it would help if I make this one link a special page which uses a server side include to a second page which would make the dynamic links? That may fool Google, but unless somebody knows whether that is infact my problem, it would take me weeks until I can even test out my theory.
vexcom posted this at 05:58 — 16th July 2004.
He has: 21 posts
Joined: Jul 2004
I think it goes by whether or not you have a hard link in your site to that page
widgets
should get indexed
fifeclub posted this at 13:30 — 16th July 2004.
He has: 688 posts
Joined: Feb 2001
That's what I thought at first but if the link is generated dynamically on the server side then how would Google know whether it was hard coded or not? Yes some of my dynamic pages are indexed proving that it is at very least possible. That's why I'm edging towards the theory of "dynamic links will get indexed only if they are located on a non-dynamically created page (no "?" in the url)." But that's just a guess.
Suzanne posted this at 15:47 — 16th July 2004.
She has: 5,507 posts
Joined: Feb 2000
Server-side processing means that the pages are compiled on the server before they are served to any web-client, whether it's a browser or a spider.
However, to answer your question, best practice is to have more-or-less static pages indexed by making them "permanent" pages. So make a page called lastname.php and submit that. You can even automate it to be entirely hard-coded if you want, or use url-mapping instead. This is best practice for search engines AND for humans -- sensible, readable urls.
Busy posted this at 23:09 — 16th July 2004.
He has: 6,151 posts
Joined: May 2001
I use ?id=1234 on one of my sites, has over 2500 variations of numbers and aren't hardcoded anywhere, to get to them they must first pass through an option (how they are displayed) this option also includes an id and sub id (? and &). I did it this way in hope search engines wouldn't follow them but sadly they do.
Google, ask Jeeves, MSN (who I recently banned), Yahoo and others have no problem finding and indexing all of them.
Google isn't to bad as they only do small bunches, ask Jeeves and MSN just go all out and grab as much as they can.
fifeclub posted this at 02:36 — 17th July 2004.
He has: 688 posts
Joined: Feb 2001
Sorry this is off subject but why did you ban MSN? I took a look at the logs on one of my sites and noticed MASSIVE bandwidth usage by msn robots. Is that why? If so, I've got a robots.txt file but what text should I add to it to kick MSN out?
Busy posted this at 04:42 — 17th July 2004.
He has: 6,151 posts
Joined: May 2001
MSN is getting to be known as a bad bot, doesn't always read or obey robots.txt and is still only in development stages, the MSNbot has nothing to do with microsoft search. Well it does but the search engine doesnt give it's result to it.
MSN is also a very bad bandwidth hog, it sucks and sucks and sucks, I had them downloading mb's of stuff on a daily basis, downloading stuff like .exe's, .avb's, .zip's etc which have always been restricted in robots but they have no reason to take stuff like that.
Ban via robots.txt first, if they disobey it block them via .htaccess
for robots.txt
User-agent: msnbot
Disallow: /
Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.