Re: using FileNames[] on large, deep directories
- To: mathgroup at smc.vnet.net
- Subject: [mg99397] Re: [mg99372] using FileNames[] on large, deep directories
- From: Leonid Shifrin <lshifr at gmail.com>
- Date: Tue, 5 May 2009 05:38:57 -0400 (EDT)
- References: <200905041000.GAA22648@smc.vnet.net>
Hi Michael, To my knowledge, there is no built-in functionality that deals with your problem. Here is one possible workaround (which implements a possibility you mentioned - does not look so bad to me). Module[{skip, shallowtraverse}, clearSkip[] := (Clear[skip]; skip[_] = False); setSkip[dir_String] := skip[dir] = True; shallowtraverse[dir_String, dirF_, fileF_] := Scan[If[FileType[#] === Directory, dirF[#], fileF[#]] &, FileNames["*", dir]]; clearSkip[]; dtraverse[dir_String, dirF_, fileF_] := Module[{travF, level = 0}, travF := (level++; dirF[#, level]; If[! TrueQ[skip[#]], shallowtraverse[#, travF, fileF]]; level--) &; shallowtraverse[dir, travF, fileF]] ]; (* End external Module *) The usage: dtraverse[directory,dirF,fileF], where dirF accepts 2 parameters:(subdir_name,level), and fileF accepts 1 parameter - file name. These 2 functions you write yourself, depending on your specific needs. I did not bother with their return types, since probably they will accomplish their main goals with side effects. What is done in the above code is that the inner function <shallowtraverse> accepts the directory name, and 2 functions which are supposed to act on directories and files respectively. They can be anything - any action. It traverses level 1 subdirectories and files. The shallowtraverse is used then to realize full directory traversal by dtraverse. The functions <clearSkip>, <setSkip> and <dtraverse> are closures sharing an inner function <skip> and realizing the traversal interface. The full traversal is realized using lazy evaluation to define another inner function <travF> in terms of itself. In this case, dtraverse is also called with 3 arguments, but dirF accepts 2 args now: the dir name and level, so you can account for a (sub)directory level in your logic for <dirF>. You can use <setSkip> at run-time (in particular, in dirF/fileF themselves), to realize less trivial logic (for example, if directories to skip only get known at run-time during the same traversal). To clear the list of dirs to be skipped, use clearSkip[]. To just see quickly what it does, try calling dtraverse[yourdir, Print,Print], (but not on the huge dir :)), possibly setting some dirs to be skipped by setSkip. To accumulate dir/file names, you can use Sow in <dirF>, <fileF>, and wrap Reap around dtraverseFull. Note that, for skipped dirs, while the traversal of this dir and its sub-dirs is skipped, your function dirF still applies to it. Note also that you can easily stop the traversal at any time by Throw-ing an exception in dirF or fileF upon fulfilling some condition. If you save the traversal results with Sow, you can do Reap[Catch[dtraverse[dir,dirF,fileF]]][[2,1]] - in this case you will get your results - those thet were accumulated before the exception was thrown. Hope this helps. Regards, Leonid On Mon, May 4, 2009 at 3:00 AM, Michael <michael2718 at gmail.com> wrote: > Hi again, > > I ran into a problem today using FileNames[] and was wondering if > anybody else has encountered this problem before, and if so how they > solved the problem. > > The problem is that FileNames can potentially take a long time to run > - in my case it took about 2 hours. This is because it was reading > the file structure of a DVD and one particular sub-directory had an > enormous number of files scattered across about 80 directories. A > simple option to exclude the directory would have saved about 1.9 > hours of run-time. > > I'm assuming I could just solve the problem by manually recursing > directories and building up the filelist using FileNames[] for each > directory, but it seems like it would be a lot cleaner if FileNames[] > had additional options similar to the 'find' command (e.g. prune). > > --Michael > >
- References:
- using FileNames[] on large, deep directories
- From: Michael <michael2718@gmail.com>
- using FileNames[] on large, deep directories