filesystems and file access effeciency

They have: 447 posts

Joined: Oct 1999

say i have a rather deep directory structure which stores stuff (templates, flat files, images, pdfs, etc). The directory tree could traverse 20+ directories deep in some places, but typically content that is accessed regularly is stored within the first 5 levels.

Now, at first this may sound like a big tangled web, but it is all stored in a very logical manner. It's also all served up through a file output script which can determine the directory path to any file based on a simple algorithm, so theres no uberlong links in web pages.

But anyways, what i'm wondering is, can this cause a big performance hit? I'm not an expert on filesystems, but say someone links a file 25 directories deep on a high-traffic site. Now, every time this file is accessed, the server has to find directory a, then search directory a for dir b, then search dir b for dir c, then search dir c for dir d, etc. Am I correct in assuming a filesystem works this way, like a linked list where you must follow each link in the list to get to the end? or does the OS determine through less taxing means the physical location of a file?

On the other hand, say everything, thousands or tens-of-thousands of files, are stored in a few directories. Now, rather than searching 20-30 directories that are relatively empty, one or two huge lists of files have to be searched for every single file request.

I'm thinking in most cases any effeciency hit will be miniscule, as filesystems have been tweaked for a long time now. I'd guess some sort of filesystem cache is in place in most OS's, such as if the file /a/b/c/d/e/.../x/y/z/myfile.pdf is requested numorous times, the OS 'remembers' where that file is without looking it up. that is only a guess though.

I think i'll do some benchmark tests, but any thoughts would be appreciated.

They have: 447 posts

Joined: Oct 1999

well, i think while creating an environment for my benchmark i stumbled upon the answer, at least as far as NTFS is concerned.

i wrote a script to create a directory structure 50 levels deep, containing a random number of randomly named files and directories between 1-100 in each. When the script got 30ish levels deep i started getting errors like "the directory 'kdfie9_kf\jfie9\...\...' doesn't exits". Basically, what (i think ) this means a NT filename such as 'myfile.txt' is really, internally named 'path\to\myfile.txt'. Which is nice, i think...

i assume what happened is i hit the filename character limit, which id guess is 255.

Mark Hensler's picture

He has: 4,048 posts

Joined: Aug 2000

Yes, Win machines still have that 255 (or so) character limit for the entire path.

mairving's picture

They have: 2,256 posts

Joined: Feb 2001

Paths do matter. For optimum performance, you are best using the shortest possible path. It is probably not going to be all that big of hit but it can be. It will as you have seen make the links rather long which can make emailing them and accessing them sometimes difficult.

Mark Irving
I have a mind like a steel trap; it is rusty and illegal in 47 states

Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.