How to interpret a 404 message
I have a site that is getting a lot of 404 errors and I understand what causes "normal" 404's.
The not-understandable 404's are this: I have all my images files in an images directory where the proper filename would be images/img01.jpg etc. I am getting a large number of 404's where the images that are not being found are "images\\img01.jpg" etc. (I have the raw web logs.) I have triple-checked the .html referred to and it is correct. I get about 700 uniques a day on this site and I am getting 20-30 of these weird 404's each on quite a few different jpg's.
Any suggestions or information will be appreciated.
Busy posted this at 21:02 — 13th December 2002.
He has: 6,151 posts
Joined: May 2001
USe a program or site checker (like http://validator.w3.org/checklink) that will give you all the bad links on your site, sorts them to 401, 403, 404 ... and fix to suit
Also you could find some of these errors are from download bots or even hotlinking (people linking to your stuff from another site).
Another reason could be server side, if your using server side you could of used two /'s accidently (easily done).
Just a thought, isn't '/' windows and '\' linux ?
fdwilk1 posted this at 21:17 — 13th December 2002.
They have: 12 posts
Joined: Sep 2002
Thanks. I'm not using any SSI and I do believe this is happening because of some "bot" but I wasn't sure. Your answer does raise another issue. In all my html I had used "/" instead of "\" until I got an answer from a DMOZ editor on another forum who stated that he used an Opera browser and that his browser could not locate a file if "/" was used. Since I am still a newbie, I figured that any DMOZ editor (who stands right next to God Almighty in power) should be believed with graciousness and humility.
If these 404's are being caused by a "evil doer" I realize I can get their IP and block them but is that potentially dangerous since a single IP could be used by any number of surfers?
All help appreciated.
Busy posted this at 21:56 — 13th December 2002.
He has: 6,151 posts
Joined: May 2001
my version of opera uses '/' maybe the linux version uses the other way.
If it's an "evil doer" blocking an IP is an option but be careful to check out the IP first, or you could end up blocking everyone that uses AOL, AT&T etc. Your better off to use a .htaccess file to block known programs that download whole sites.
have a look in your logs, at the end of each line there are details like this: "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" the details in there are what you look for, bot/program names are listed in there like this: "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)"
HTTrack is a download program or just: "WebCopier v3.3"
downside to the .htaccess file is it can get BIG, my one I use blocks over 130 programs (download and email harvesters) The list is to big to copy here, if you want the details of my .htaccess file, email me and I'll send it
Another thought, check your logs again and make sure its not a search engine, several search engines cache pages and take a while to get around to changing to the new version if you have changed it. I blocked caching on my site cause MSN and Google have both choked on my pages and I don't allow hotlinking, to stop caching, add in all your pages
Suzanne posted this at 16:51 — 14th December 2002.
She has: 5,507 posts
Joined: Feb 2000
it could be a badly written spider. if you have a custom 404 page, make sure every url is absolute.
fdwilk1 posted this at 19:48 — 14th December 2002.
They have: 12 posts
Joined: Sep 2002
Suzanne, pardon my ignorance but by "absolute url" do you mean http://www.classic-british-cars.com/triumph-for-sale.html rather than "triumph-for-sale.html"?
Suzanne posted this at 20:47 — 14th December 2002.
She has: 5,507 posts
Joined: Feb 2000
yes!
Absolute: http://www.blah.com/whatever.ext
Relative: ../whatever.ext or whatever.ext
Relative to Root: /whatever.ext
Using either relative url in the custom 404 pages can trap poorly written spiders, which generate all those //something.gif errors.
Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.