How to interpret a 404 message

They have: 12 posts

Joined: Sep 2002

I have a site that is getting a lot of 404 errors and I understand what causes "normal" 404's.

The not-understandable 404's are this: I have all my images files in an images directory where the proper filename would be images/img01.jpg etc. I am getting a large number of 404's where the images that are not being found are "images\\img01.jpg" etc. (I have the raw web logs.) I have triple-checked the .html referred to and it is correct. I get about 700 uniques a day on this site and I am getting 20-30 of these weird 404's each on quite a few different jpg's.

Any suggestions or information will be appreciated.

Busy's picture

He has: 6,151 posts

Joined: May 2001

USe a program or site checker (like http://validator.w3.org/checklink) that will give you all the bad links on your site, sorts them to 401, 403, 404 ... and fix to suit

Also you could find some of these errors are from download bots or even hotlinking (people linking to your stuff from another site).

Another reason could be server side, if your using server side you could of used two /'s accidently (easily done).

Just a thought, isn't '/' windows and '\' linux ?

They have: 12 posts

Joined: Sep 2002

Thanks. I'm not using any SSI and I do believe this is happening because of some "bot" but I wasn't sure. Your answer does raise another issue. In all my html I had used "/" instead of "\" until I got an answer from a DMOZ editor on another forum who stated that he used an Opera browser and that his browser could not locate a file if "/" was used. Since I am still a newbie, I figured that any DMOZ editor (who stands right next to God Almighty in power) should be believed with graciousness and humility.

If these 404's are being caused by a "evil doer" I realize I can get their IP and block them but is that potentially dangerous since a single IP could be used by any number of surfers?

All help appreciated.

Busy's picture

He has: 6,151 posts

Joined: May 2001

my version of opera uses '/' maybe the linux version uses the other way.

If it's an "evil doer" blocking an IP is an option but be careful to check out the IP first, or you could end up blocking everyone that uses AOL, AT&T etc. Your better off to use a .htaccess file to block known programs that download whole sites.

have a look in your logs, at the end of each line there are details like this: "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" the details in there are what you look for, bot/program names are listed in there like this: "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)"
HTTrack is a download program or just: "WebCopier v3.3"

downside to the .htaccess file is it can get BIG, my one I use blocks over 130 programs (download and email harvesters) The list is to big to copy here, if you want the details of my .htaccess file, email me and I'll send it

Another thought, check your logs again and make sure its not a search engine, several search engines cache pages and take a while to get around to changing to the new version if you have changed it. I blocked caching on my site cause MSN and Google have both choked on my pages and I don't allow hotlinking, to stop caching, add in all your pages

Suzanne's picture

She has: 5,507 posts

Joined: Feb 2000

it could be a badly written spider. if you have a custom 404 page, make sure every url is absolute.

They have: 12 posts

Joined: Sep 2002

Suzanne, pardon my ignorance but by "absolute url" do you mean http://www.classic-british-cars.com/triumph-for-sale.html rather than "triumph-for-sale.html"?

Suzanne's picture

She has: 5,507 posts

Joined: Feb 2000

yes!

Absolute: http://www.blah.com/whatever.ext
Relative: ../whatever.ext or whatever.ext
Relative to Root: /whatever.ext

Using either relative url in the custom 404 pages can trap poorly written spiders, which generate all those //something.gif errors.

Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.