Re: Importing "Plaintext" from PDF

*To*: mathgroup at smc.vnet.net*Subject*: [mg113505] Re: Importing "Plaintext" from PDF*From*: Joseph Gwinn <joegwinn at comcast.net>*Date*: Sun, 31 Oct 2010 02:09:56 -0500 (EST)*References*: <iagldl$3cm$1@smc.vnet.net>

In article <iagldl$3cm$1 at smc.vnet.net>, Mark Coleman <markspcoleman at gmail.com> wrote: > Hi, > > I'm attempting to use Mathematica (v7.01) to Import the text from a PDF file. > If I simply Import[] the file, it returns a list of graphics objects > representing each page of the file. If I use use "Plaintext" option of > Import[], it returns an empty list. My source pdf files were obtained > from Google's Patent Search function. Just wondering if I there is > some option I am missing or if Mathematica cannot Import text from pdf files. The pdf contains scans (like a fax), not text. Google patents has the text generated by OCR of the scans, but even for straight English text the error rate is significant, at least 1% on older patents. OCR of math equations is basically hopeless. Nor are the published equations written in Mathematica. You will have to do this manually. Joe Gwinn