MathGroup Archive 2011

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Import - a cautionary tale

  • To: mathgroup at smc.vnet.net
  • Subject: [mg118677] Re: Import - a cautionary tale
  • From: "Hans Michel" <hmichel at cox.net>
  • Date: Sat, 7 May 2011 07:31:08 -0400 (EDT)

I would concur. I originally thought your accidental post regarding RTF was
to any of the "Hans" in this group, I did answer someone RTF question a
while back. 

http://groups.google.com/group/comp.soft-sys.math.mathematica/msg/fb973193ad
33bd2d?hl=en&dmode=source

I did not reply to your post as it looked accidental.

I always use custom transforms (rules and options) and not any defaults, or
at least explicitly state the option value and rules.

As this experience was a result of private consulting we may not see any
solutions posted by you. But lessons learned is good enough warning. 

If the file was a word processed template, I would have stuck with XML as a
common data structure amongst applications. Is RTF is a closed standard? Too
many applications save/export RTF with extraneous data. I have stated many
times in this forum that the Import is a gateway function for Mathematica
and that more work is needed by WRI to make certain that this gateway
function works well.

For example, Mathematica front end should use a similar interface as when
dealing with large output (Short). Use the FrontEndResource. With an added
attribute to import such as say "Interactive" the file or object of Import
would be queued and user will have the opportunity to set options and rules.
For example a fixed width file one can set the column delimiters by dragging
a tab on a ruler bar. A complicated EDI file one can set the component,
subcomponent, field, escape, and record characters. For formatted input this
interface can show a preview. If the preview is non optimal refine the rules
and options until one get desired outcome.

The assumption that every user will read the manual (help documents) and
that every user will be intimately familiar with the content of the file
they are importing is an obstacle to usability of an application. Excel does
this but it also fails because for many files it only scans the first 10-100
rows and assumes datatypes will be the same. So when you have a few blank
columns in the first few rows, one gets entire columns of blanks data. I
would hope when or if WRI provides this type of functionality they do it
better.

Hans

-----Original Message-----
From: David Bailey [mailto:dave at removedbailey.co.uk] 
Sent: Friday, May 06, 2011 6:23 AM
To: mathgroup at smc.vnet.net
Subject: [mg118677] [mg118660] Import - a cautionary tale

The Import function is, of course, a front for a whole suite of internal 
importing functions - one for each type of data file. While many of 
these are excellent and comprehensive - e.g. the various image import 
operations, this is not true of every data type.

Recently, a client wanted to use a word processed document as a template 
for creating a report containing some Mathematica output. Of course, 
this would have been very easy using a notebook as the template, but for 
various reasons it was necessary to use an RTF or similar file.

Since according to the documentation for Import/RTF, it is possible to 
import such a file as a notebook - which would have been ideal for my 
purpose. I prepared a very simple RTF file (using WordPad), and was 
amazed that, when imported as a notebook, the fonts were hugely 
distorted (some larger, some smaller) and the center alignment was lost. 
I emphasise that this was an extremely simple file, so it is hard to 
imagine that Import RTF->Notebook had had any testing or quality 
assurance whatsoever.

I then tried a number of other document formats. PDF files looked good 
when imported, until I realised that they were imported as page images - 
essentially useless for subsequent manipulation. HTML files do not 
import in any formatted form at all.

My advice would be that if you are planning a project that will involve 
importing data from one of the many file types supposedly supported by 
Mathematica, you should test that claim early on, to avoid subsequent 
disappointment.

I do wish WRI would devote some work to documenting the Import and 
Export of each data type to the standard of the documentation of other 
functions. Not only would this help users access the full power of 
Import/Export, but merely documenting these functions properly, would 
reveal the shortcomings I have described.

If certain data formats are only minimally supported, perhaps they 
should be placed in Developer` or Experimental` until they are fit for 
purpose.

David Bailey
http://www.dbaileyconsultancy.co.uk



  • Prev by Date: Iterative process using IsotopeData.
  • Next by Date: Re: Converting a list to arguments for a function --- How?
  • Previous by thread: Import - a cautionary tale
  • Next by thread: Re: Stiff ODE: Modified Poisson-Boltzmann