Re: Importing "Plaintext" from PDF
- To: mathgroup at smc.vnet.net
- Subject: [mg113493] Re: Importing "Plaintext" from PDF
- From: Bill Rowe <readnews at sbcglobal.net>
- Date: Sun, 31 Oct 2010 02:07:39 -0500 (EST)
On 10/30/10 at 4:36 AM, markspcoleman at gmail.com (Mark Coleman) wrote: >I'm attempting to use Mathematica (v7.01) to Import the text from a >PDF file. If I simply Import[] the file, it returns a list of >graphics objects representing each page of the file. If I use use >"Plaintext" option of Import[], it returns an empty list. My source >pdf files were obtained from Google's Patent Search function. Just >wondering if I there is some option I am missing or if Mathematica >cannot Import text from pdf files. It is not at all difficult to import just the text from PDF files into Mathematica. The basic syntax is Import["filename",{"PDF","Plaintext"}] This will import all of the text in the PDF file assuming it exists. This will not do anything for you if the document was scanned into the PDF file. In that case, there is no plaintext to import. You can get more information regarding options related to importing of file formats by clicking on the file format of interest in the page that is returned by searching for guide/ImportingAndExporting in the document center. Another way to get to this page would be to look up either Import or Export in the documentation center and click on the Listing of Formats just to the right of the large bold Import or Export.