MathGroup Archive: November 2010 [00669]

[Date Index] [Thread Index] [Author Index]

Re: TextRecognize tabular data

To: mathgroup at smc.vnet.net
Subject: [mg114185] Re: TextRecognize tabular data
From: telefunkenvf14 <rgorka at gmail.com>
Date: Fri, 26 Nov 2010 05:29:44 -0500 (EST)
References: <iciun9$lgu$1@smc.vnet.net>

On Nov 24, 5:59 am, Nate Dudenhoeffer <ndudenhoef... at gmail.com> wrote:
> I was just looking at some new functions in Mathematica 8, and found
> TextRecognize.  It is an OCR tool, and overall seems to work quite well.
>
> I would like to find a way to use it to get tabular data into a Mathematica
> list (even tabular data which may have columns of uneven length).   Any
> ideas on how to accomplish this?
>
> Thanks,
> Nate

1. Import and slice the image file into desired chunks using
ImagePartition[]. TextRecognize[#]&/@{your data to recognize, some
more data to recognize, and so on...} You may have to experiment with
scan resolutions to get the best result. Also, if you're data are
warped, due to sloppy copying/scanning this process will be a major
PITA. See point (3) below.

2. I've played around with TextRecognize (only in dev. versions, not
the final version 8 yet) and haven't been able to get consistent
results---even with good quality images. One trick I've found is that
preprocessing with MorphologicalBinarize[] can improve accuracy. (Try
a Manipulate[TextRecognize[MorphologicalBinarize[image,{a,b}]],.....]
or Manipulate[TextRecognize[MorphologicalBinarize[image,{b^5 &,
b}]],.....] and play with the parameters until you close in on
something that works best for a specific case.

3. If you have a large project, I'd recommend regular OCR software.
(especially if the images need to be straightened or cleaned up prior
to OCRing)

-RG

Prev by Date: Re: FindMaximum - f is a MathLink function

Next by Date: Re: Efficient search for bounding list elements

Previous by thread: TextRecognize tabular data

Next by thread: Re: TextRecognize tabular data