MathGroup Archive: October 2010 [00697]

[Date Index] [Thread Index] [Author Index]

Re: Importing "Plaintext" from PDF

To: mathgroup at smc.vnet.net
Subject: [mg113493] Re: Importing "Plaintext" from PDF
From: Bill Rowe <readnews at sbcglobal.net>
Date: Sun, 31 Oct 2010 02:07:39 -0500 (EST)

On 10/30/10 at 4:36 AM, markspcoleman at gmail.com (Mark Coleman) wrote:

>I'm attempting to use Mathematica (v7.01) to Import the text from a
>PDF file. If I simply Import[] the file, it returns a list of
>graphics objects representing each page of the file. If I use use
>"Plaintext" option of Import[], it returns an empty list. My source
>pdf files were obtained from Google's Patent Search function. Just
>wondering if I there is some option I am missing or if Mathematica
>cannot Import text from pdf files.

It is not at all difficult to import just the text from PDF
files into Mathematica. The basic syntax is

Import["filename",{"PDF","Plaintext"}]

This will import all of the text in the PDF file assuming it
exists. This will not do anything for you if the document was
scanned into the PDF file. In that case, there is no plaintext
to import.

You can get more information regarding options related to
importing of file formats by clicking on the file format of
interest in the page that is returned by searching for

guide/ImportingAndExporting

in the document center. Another way to get to this page would be
to look up either Import or Export in the documentation center
and click on the Listing of Formats just to the right of the
large bold Import or Export.

Prev by Date: Re: solving an integral

Next by Date: Re: Condensed syntax

Previous by thread: Re: Importing "Plaintext" from PDF

Next by thread: Mac OS request