Batch Jobs - Comments?
- To: mathgroup at smc.vnet.net
- Subject: [mg8209] Batch Jobs - Comments?
- From: Mark Evans <evans at gte.net>
- Date: Mon, 18 Aug 1997 23:24:58 -0400
- Organization: None
- Sender: owner-wri-mathgroup at wolfram.com
8/13/97 Roger, It dawns on me to suggest generalizing the batch conversion concept! Let me explain. Suppose I have a folder full of data files. It doesn't matter what kind of data files they are. Now posit a function BatchProcess[sourceDirectory,destinationDirectory, filterFunction,readFunction, processFunction,renameFunction,outputFunction, subFolders:(True | False),allAtOnce:(True | False)] (My syntax is just ad-hoc, nothing sacred about it.) The user specifies a set of functions that do their eponymous jobs. You could make the functions be options instead of arguments, and assume some default functions. For instance, the default readFunction would just read in all the bytes in the file as a list. The default filterFunction would use the form "*.*" to grab any kind of file in the directory. The renameFunction would decide what to call the processed output; its argument would be the original file name. The outputFunction would actually do the disk write. Its arguments would be the outputs of renameFunction and processFunction. The flag subFolders indicates whether the BatchProcess should be carried on within subFolders (maybe a number should be used indicating how deep, Infinity for all the way). Each subFolder represents a different instantiation of BatchProcess. BatchProcess will create an identical folder structure inside the destinationDirectory as required to comply with the subFolders specification. This will be one of the major advantages of BatchProcess. The flag allAtOnce indicates whether the batch job should proceed sequentially, one file at a time, or whether the data contained in all filtered files should be read in at once. This is a necessary flag. For example I am taking the bias out of an image sequence. Each image is one file, but there is not enough information in one image by itself to know what its bias is relative to the whole sequence. I take the whole sequence in, average the whole, and then subtract the bias from each image in the sequence relative to this common average. On the other hand, there are plenty of batch-type jobs that would get all the information they need for one file from that one file. The input to processFunction, then, would be either a single data file's contents, or a list of all data file contents in the folder, with a number indicating which file is the current interest, so that Part[] could be used properly. WRI should supply typical processFunctions. I think they could convert to TIFFs, WAVs, or what have you. The output filename should be "Automatic" in these kinds of standard cases (e.g. stem + ".wav"). If the destinationDirectory is the SAME as the sourceDirectory, then Mathematica does not create a new directory structure mirroring the source structure. Instead, it deposits converted/processed files in the same directory as the source files. There should also be ways to handle many-to-one processing. A typical batch job would be one-to-one. The example I adduced above is a many-to-one, but is executed N times. I am referring in this paragraph to a many-to-one that executes just once. So the output is a single file whose processFunction depends on all the files in a given directory. The readFunction should not be Mathematica code per se, although it could be. I am thinking more of a kind of template that tells Mathematica what are the contents of the file. The kernel sucks up the whole file and interprets it according to the readFunction; the readFunction has nothing to do with disk I/O, unless the user wants it that way. There should be default readFunctions that know about TIFFs and other formats. Some of these formats have byte-offset pointers to their own contents, like TIFF, so the readFunction is somewhat nontrivial. All very complicated, I am sure, but all very useful. I especially like the idea of having Mathematica automatically generate the new directory structure and output filenames for me. That turns out to be rather painful as things stand right now. There will need to be more kinds of filtering options on filenames. I think right now the only special character is the wildcard "*". You need the single-character wildcard "?" and the other grep-style patterns like [0-9] and [a-zA-z]+ for this scheme to work very well. Best regards, Mark