|
[Date Index]
[Thread Index]
[Author Index]
Re: Importing "Plaintext" from PDF
- To: mathgroup at smc.vnet.net
- Subject: [mg113505] Re: Importing "Plaintext" from PDF
- From: Joseph Gwinn <joegwinn at comcast.net>
- Date: Sun, 31 Oct 2010 02:09:56 -0500 (EST)
- References: <iagldl$3cm$1@smc.vnet.net>
In article <iagldl$3cm$1 at smc.vnet.net>,
Mark Coleman <markspcoleman at gmail.com> wrote:
> Hi,
>
> I'm attempting to use Mathematica (v7.01) to Import the text from a PDF file.
> If I simply Import[] the file, it returns a list of graphics objects
> representing each page of the file. If I use use "Plaintext" option of
> Import[], it returns an empty list. My source pdf files were obtained
> from Google's Patent Search function. Just wondering if I there is
> some option I am missing or if Mathematica cannot Import text from pdf files.
The pdf contains scans (like a fax), not text. Google patents has the
text generated by OCR of the scans, but even for straight English text
the error rate is significant, at least 1% on older patents.
OCR of math equations is basically hopeless. Nor are the published
equations written in Mathematica. You will have to do this manually.
Joe Gwinn
Prev by Date:
Re: Condensed syntax
Next by Date:
Re: solving an integral
Previous by thread:
Re: Importing "Plaintext" from PDF
Next by thread:
Re: Importing "Plaintext" from PDF
|