Re: Import - a cautionary tale
- To: mathgroup at smc.vnet.net
- Subject: [mg118677] Re: Import - a cautionary tale
- From: "Hans Michel" <hmichel at cox.net>
- Date: Sat, 7 May 2011 07:31:08 -0400 (EDT)
I would concur. I originally thought your accidental post regarding RTF was to any of the "Hans" in this group, I did answer someone RTF question a while back. http://groups.google.com/group/comp.soft-sys.math.mathematica/msg/fb973193ad 33bd2d?hl=en&dmode=source I did not reply to your post as it looked accidental. I always use custom transforms (rules and options) and not any defaults, or at least explicitly state the option value and rules. As this experience was a result of private consulting we may not see any solutions posted by you. But lessons learned is good enough warning. If the file was a word processed template, I would have stuck with XML as a common data structure amongst applications. Is RTF is a closed standard? Too many applications save/export RTF with extraneous data. I have stated many times in this forum that the Import is a gateway function for Mathematica and that more work is needed by WRI to make certain that this gateway function works well. For example, Mathematica front end should use a similar interface as when dealing with large output (Short). Use the FrontEndResource. With an added attribute to import such as say "Interactive" the file or object of Import would be queued and user will have the opportunity to set options and rules. For example a fixed width file one can set the column delimiters by dragging a tab on a ruler bar. A complicated EDI file one can set the component, subcomponent, field, escape, and record characters. For formatted input this interface can show a preview. If the preview is non optimal refine the rules and options until one get desired outcome. The assumption that every user will read the manual (help documents) and that every user will be intimately familiar with the content of the file they are importing is an obstacle to usability of an application. Excel does this but it also fails because for many files it only scans the first 10-100 rows and assumes datatypes will be the same. So when you have a few blank columns in the first few rows, one gets entire columns of blanks data. I would hope when or if WRI provides this type of functionality they do it better. Hans -----Original Message----- From: David Bailey [mailto:dave at removedbailey.co.uk] Sent: Friday, May 06, 2011 6:23 AM To: mathgroup at smc.vnet.net Subject: [mg118677] [mg118660] Import - a cautionary tale The Import function is, of course, a front for a whole suite of internal importing functions - one for each type of data file. While many of these are excellent and comprehensive - e.g. the various image import operations, this is not true of every data type. Recently, a client wanted to use a word processed document as a template for creating a report containing some Mathematica output. Of course, this would have been very easy using a notebook as the template, but for various reasons it was necessary to use an RTF or similar file. Since according to the documentation for Import/RTF, it is possible to import such a file as a notebook - which would have been ideal for my purpose. I prepared a very simple RTF file (using WordPad), and was amazed that, when imported as a notebook, the fonts were hugely distorted (some larger, some smaller) and the center alignment was lost. I emphasise that this was an extremely simple file, so it is hard to imagine that Import RTF->Notebook had had any testing or quality assurance whatsoever. I then tried a number of other document formats. PDF files looked good when imported, until I realised that they were imported as page images - essentially useless for subsequent manipulation. HTML files do not import in any formatted form at all. My advice would be that if you are planning a project that will involve importing data from one of the many file types supposedly supported by Mathematica, you should test that claim early on, to avoid subsequent disappointment. I do wish WRI would devote some work to documenting the Import and Export of each data type to the standard of the documentation of other functions. Not only would this help users access the full power of Import/Export, but merely documenting these functions properly, would reveal the shortcomings I have described. If certain data formats are only minimally supported, perhaps they should be placed in Developer` or Experimental` until they are fit for purpose. David Bailey http://www.dbaileyconsultancy.co.uk