MathGroup Archive 1997

[Date Index] [Thread Index] [Author Index]

Search the Archive

Batch Jobs - Comments?

  • To: mathgroup at smc.vnet.net
  • Subject: [mg8209] Batch Jobs - Comments?
  • From: Mark Evans <evans at gte.net>
  • Date: Mon, 18 Aug 1997 23:24:58 -0400
  • Organization: None
  • Sender: owner-wri-mathgroup at wolfram.com

8/13/97

Roger,

It dawns on me to suggest generalizing the batch conversion concept! 
Let me explain.

Suppose I have a folder full of data files.  It doesn't matter what kind
of data files they are.  Now posit a function

	BatchProcess[sourceDirectory,destinationDirectory,
		filterFunction,readFunction,
		processFunction,renameFunction,outputFunction,
		subFolders:(True | False),allAtOnce:(True | False)]

(My syntax is just ad-hoc, nothing sacred about it.)  The user specifies
a set of functions that do their eponymous jobs.  You could make the
functions be options instead of arguments, and assume some default
functions.  For instance, the default readFunction would just read in
all the bytes in the file as a list.  The default filterFunction would
use the form "*.*" to grab any kind of file in the directory.  The
renameFunction would decide what to call the processed output; its
argument would be the original file name.  The outputFunction would
actually do the disk write.  Its arguments would be the outputs of
renameFunction and processFunction.

The flag subFolders indicates whether the BatchProcess should be carried
on within subFolders (maybe a number should be used indicating how deep,
Infinity for all the way).  Each subFolder represents a different
instantiation of BatchProcess.  BatchProcess will create an identical
folder structure inside the destinationDirectory as required to comply
with the subFolders specification.  This will be one of the major
advantages of BatchProcess.

The flag allAtOnce indicates whether the batch job should proceed
sequentially, one file at a time, or whether the data contained in all
filtered files should be read in at once.  This is a necessary flag. 
For example I am taking the bias out of an image sequence.  Each image
is one file, but there is not enough information in one image by itself
to know what its bias is relative to the whole sequence.  I take the
whole sequence in, average the whole, and then subtract the bias from
each image in the sequence relative to this common average.  On the
other hand, there are plenty of batch-type jobs that would get all the
information they need for one file from that one file.  The input to
processFunction, then, would be either a single data file's contents, or
a list of all data file contents in the folder, with a number indicating
which file is the current interest, so that Part[] could be used
properly.

WRI should supply typical processFunctions.  I think they could convert
to TIFFs, WAVs, or what have you.  The output filename should be
"Automatic" in these kinds of standard cases (e.g. stem + ".wav").

If the destinationDirectory is the SAME as the sourceDirectory, then
Mathematica does not create a new directory structure mirroring the
source structure.  Instead, it deposits converted/processed files in the
same directory as the source files.

There should also be ways to handle many-to-one processing.  A typical
batch job would be one-to-one.  The example I adduced above is a
many-to-one, but is executed N times.  I am referring in this paragraph
to a many-to-one that executes just once.  So the output is a single
file whose processFunction depends on all the files in a given
directory.

The readFunction should not be Mathematica code per se, although it
could be.  I am thinking more of a kind of template that tells
Mathematica what are the contents of the file.  The kernel sucks up the
whole file and interprets it according to the readFunction; the
readFunction has nothing to do with disk I/O, unless the user wants it
that way.  There should be default readFunctions that know about TIFFs
and other formats.  Some of these formats have byte-offset pointers to
their own contents, like TIFF, so the readFunction is somewhat
nontrivial.

All very complicated, I am sure, but all very useful.  I especially like
the idea of having Mathematica automatically generate the new directory
structure and output filenames for me.  That turns out to be rather
painful as things stand right now.

There will need to be more kinds of filtering options on filenames.  I
think right now the only special character is the wildcard "*".  You
need the single-character wildcard "?" and the other grep-style patterns
like [0-9] and [a-zA-z]+ for this scheme to work very well.

Best regards,

Mark



  • Prev by Date: One-liner for files?
  • Next by Date: Meinershagen's Question on Exp Fit
  • Previous by thread: One-liner for files?
  • Next by thread: Meinershagen's Question on Exp Fit