Import - a cautionary tale
- To: mathgroup at smc.vnet.net
- Subject: [mg118660] Import - a cautionary tale
- From: David Bailey <dave at removedbailey.co.uk>
- Date: Fri, 6 May 2011 07:23:29 -0400 (EDT)
The Import function is, of course, a front for a whole suite of internal importing functions - one for each type of data file. While many of these are excellent and comprehensive - e.g. the various image import operations, this is not true of every data type. Recently, a client wanted to use a word processed document as a template for creating a report containing some Mathematica output. Of course, this would have been very easy using a notebook as the template, but for various reasons it was necessary to use an RTF or similar file. Since according to the documentation for Import/RTF, it is possible to import such a file as a notebook - which would have been ideal for my purpose. I prepared a very simple RTF file (using WordPad), and was amazed that, when imported as a notebook, the fonts were hugely distorted (some larger, some smaller) and the center alignment was lost. I emphasise that this was an extremely simple file, so it is hard to imagine that Import RTF->Notebook had had any testing or quality assurance whatsoever. I then tried a number of other document formats. PDF files looked good when imported, until I realised that they were imported as page images - essentially useless for subsequent manipulation. HTML files do not import in any formatted form at all. My advice would be that if you are planning a project that will involve importing data from one of the many file types supposedly supported by Mathematica, you should test that claim early on, to avoid subsequent disappointment. I do wish WRI would devote some work to documenting the Import and Export of each data type to the standard of the documentation of other functions. Not only would this help users access the full power of Import/Export, but merely documenting these functions properly, would reveal the shortcomings I have described. If certain data formats are only minimally supported, perhaps they should be placed in Developer` or Experimental` until they are fit for purpose. David Bailey http://www.dbaileyconsultancy.co.uk