MathGroup Archive: May 2009 [00188]

[Date Index] [Thread Index] [Author Index]

Re: using FileNames[] on large, deep directories

To: mathgroup at smc.vnet.net
Subject: [mg99397] Re: [mg99372] using FileNames[] on large, deep directories
From: Leonid Shifrin <lshifr at gmail.com>
Date: Tue, 5 May 2009 05:38:57 -0400 (EDT)
References: <200905041000.GAA22648@smc.vnet.net>

Hi Michael,

To my knowledge, there is no built-in functionality that deals with your
problem. Here is one  possible workaround (which implements a possibility
you mentioned - does not look so bad to me).

Module[{skip, shallowtraverse},
  clearSkip[] := (Clear[skip]; skip[_] = False);

  setSkip[dir_String] := skip[dir] = True;

  shallowtraverse[dir_String, dirF_, fileF_] :=
   Scan[If[FileType[#] === Directory, dirF[#], fileF[#]] &,
    FileNames["*", dir]];

  clearSkip[];

  dtraverse[dir_String, dirF_, fileF_] :=
   Module[{travF, level = 0},
      travF := (level++; dirF[#, level];
                   If[! TrueQ[skip[#]], shallowtraverse[#, travF, fileF]];
                   level--) &;
    shallowtraverse[dir, travF, fileF]]

]; (* End external Module *)

The usage:

dtraverse[directory,dirF,fileF], where

dirF accepts 2 parameters:(subdir_name,level), and
fileF accepts 1 parameter - file name.

These 2 functions you write yourself, depending on your specific needs.
I did not bother with their return types, since probably they will
accomplish their main goals  with side effects.

What is done in the above code is that the inner function <shallowtraverse>
accepts the directory name, and 2 functions which are supposed to act on
directories and files respectively. They can be anything - any action. It
traverses  level 1 subdirectories and files. The shallowtraverse is used
then to realize full directory traversal by dtraverse.

The functions <clearSkip>, <setSkip> and <dtraverse> are closures sharing an
inner function <skip> and realizing the traversal interface. The full
traversal is realized using  lazy evaluation to define another inner
function <travF> in terms of itself.
In this case, dtraverse is also called with 3 arguments, but dirF accepts 2
args now: the dir name and level, so you can account for a (sub)directory
level in your logic for <dirF>. You can use <setSkip> at run-time (in
particular, in dirF/fileF themselves), to realize less trivial logic (for
example, if directories to skip only get known at run-time during the same
traversal). To clear the list of dirs to be skipped, use clearSkip[].

To just see quickly what it does, try calling dtraverse[yourdir,
Print,Print],
(but not on the huge dir :)), possibly setting some dirs to be skipped by
setSkip. To accumulate dir/file names, you can use Sow in <dirF>, <fileF>,
and wrap Reap around dtraverseFull. Note that, for skipped dirs, while the
traversal of this dir and its sub-dirs is skipped, your function dirF still
applies to it. Note also that you can easily stop the traversal at any time
by Throw-ing an exception in dirF or fileF upon fulfilling some condition.
If you save the traversal results with Sow, you can do
Reap[Catch[dtraverse[dir,dirF,fileF]]][[2,1]] - in this case you will get
your results - those thet were accumulated before the exception was thrown.

Hope this helps.

Regards,
Leonid

On Mon, May 4, 2009 at 3:00 AM, Michael <michael2718 at gmail.com> wrote:

> Hi again,
>
> I ran into a problem today using FileNames[] and was wondering if
> anybody else has encountered this problem before, and if so how they
> solved the problem.
>
> The problem is that FileNames can potentially take a long time to run
> - in my case it took about 2 hours.  This is because it was reading
> the file structure of a DVD and one particular sub-directory had an
> enormous number of files scattered across about 80 directories.  A
> simple option to exclude the directory would have saved about 1.9
> hours of run-time.
>
> I'm assuming I could just solve the problem by manually recursing
> directories and building up the filelist using FileNames[] for each
> directory, but it seems like it would be a lot cleaner if FileNames[]
> had additional options similar to the 'find' command (e.g. prune).
>
> --Michael
>
>

References:
- using FileNames[] on large, deep directories
  - From: Michael <michael2718@gmail.com>

Prev by Date: Re: Diffusion Model using NDSolve - Advice needed

Next by Date: Re: Diffusion Model using NDSolve - Advice needed

Previous by thread: using FileNames[] on large, deep directories

Next by thread: why does DownValues not return all downvalues for a symbol?