Re: TextRecognize tabular data
- To: mathgroup at smc.vnet.net
- Subject: [mg114185] Re: TextRecognize tabular data
- From: telefunkenvf14 <rgorka at gmail.com>
- Date: Fri, 26 Nov 2010 05:29:44 -0500 (EST)
- References: <iciun9$lgu$1@smc.vnet.net>
On Nov 24, 5:59 am, Nate Dudenhoeffer <ndudenhoef... at gmail.com> wrote: > I was just looking at some new functions in Mathematica 8, and found > TextRecognize. It is an OCR tool, and overall seems to work quite well. > > I would like to find a way to use it to get tabular data into a Mathematica > list (even tabular data which may have columns of uneven length). Any > ideas on how to accomplish this? > > Thanks, > Nate 1. Import and slice the image file into desired chunks using ImagePartition[]. TextRecognize[#]&/@{your data to recognize, some more data to recognize, and so on...} You may have to experiment with scan resolutions to get the best result. Also, if you're data are warped, due to sloppy copying/scanning this process will be a major PITA. See point (3) below. 2. I've played around with TextRecognize (only in dev. versions, not the final version 8 yet) and haven't been able to get consistent results---even with good quality images. One trick I've found is that preprocessing with MorphologicalBinarize[] can improve accuracy. (Try a Manipulate[TextRecognize[MorphologicalBinarize[image,{a,b}]],.....] or Manipulate[TextRecognize[MorphologicalBinarize[image,{b^5 &, b}]],.....] and play with the parameters until you close in on something that works best for a specific case. 3. If you have a large project, I'd recommend regular OCR software. (especially if the images need to be straightened or cleaned up prior to OCRing) -RG