filesystems and file access effeciency
say i have a rather deep directory structure which stores stuff (templates, flat files, images, pdfs, etc). The directory tree could traverse 20+ directories deep in some places, but typically content that is accessed regularly is stored within the first 5 levels.
Now, at first this may sound like a big tangled web, but it is all stored in a very logical manner. It's also all served up through a file output script which can determine the directory path to any file based on a simple algorithm, so theres no uberlong links in web pages.
But anyways, what i'm wondering is, can this cause a big performance hit? I'm not an expert on filesystems, but say someone links a file 25 directories deep on a high-traffic site. Now, every time this file is accessed, the server has to find directory a, then search directory a for dir b, then search dir b for dir c, then search dir c for dir d, etc. Am I correct in assuming a filesystem works this way, like a linked list where you must follow each link in the list to get to the end? or does the OS determine through less taxing means the physical location of a file?
On the other hand, say everything, thousands or tens-of-thousands of files, are stored in a few directories. Now, rather than searching 20-30 directories that are relatively empty, one or two huge lists of files have to be searched for every single file request.
I'm thinking in most cases any effeciency hit will be miniscule, as filesystems have been tweaked for a long time now. I'd guess some sort of filesystem cache is in place in most OS's, such as if the file /a/b/c/d/e/.../x/y/z/myfile.pdf is requested numorous times, the OS 'remembers' where that file is without looking it up. that is only a guess though.
I think i'll do some benchmark tests, but any thoughts would be appreciated.
ROB posted this at 05:15 — 20th August 2002.
They have: 447 posts
Joined: Oct 1999
well, i think while creating an environment for my benchmark i stumbled upon the answer, at least as far as NTFS is concerned.
i wrote a script to create a directory structure 50 levels deep, containing a random number of randomly named files and directories between 1-100 in each. When the script got 30ish levels deep i started getting errors like "the directory 'kdfie9_kf\jfie9\...\...' doesn't exits". Basically, what (i think ) this means a NT filename such as 'myfile.txt' is really, internally named 'path\to\myfile.txt'. Which is nice, i think...
i assume what happened is i hit the filename character limit, which id guess is 255.
Mark Hensler posted this at 06:38 — 20th August 2002.
He has: 4,048 posts
Joined: Aug 2000
Yes, Win machines still have that 255 (or so) character limit for the entire path.
mairving posted this at 11:53 — 20th August 2002.
They have: 2,256 posts
Joined: Feb 2001
Paths do matter. For optimum performance, you are best using the shortest possible path. It is probably not going to be all that big of hit but it can be. It will as you have seen make the links rather long which can make emailing them and accessing them sometimes difficult.
Mark Irving
I have a mind like a steel trap; it is rusty and illegal in 47 states
Want to join the discussion? Create an account or log in if you already have one. Joining is fast, free and painless! We’ll even whisk you back here when you’ve finished.