Re: importing fixed width data
- To: mathgroup at smc.vnet.net
- Subject: [mg97541] Re: importing fixed width data
- From: Bob F <deepyogurt at gmail.com>
- Date: Sat, 14 Mar 2009 18:17:24 -0500 (EST)
- References: <gpg1lt$cps$1@smc.vnet.net>
On Mar 14, 4:42 am, swiftset <swift... at gmail.com> wrote: > Hi, I'm trying to import data from a file where the fields are stored > in a fixed width format. The Import["file", "Table"] command is the > closest I can get to what I'd like, but then the data is divided using > whitespace, so I get a lot of extraneous fields, and no way to tell > whether what Mathematica imported as two fields is actually just one > that contained a whitespace. > > Any ideas on how to proceed? Is the problem because some of your fixed-width data fields are blank and you would like Mathematica to import a "blank" when the field is really blank, instead of thinking this "line" has fewer number of fields? Well, if you are on a Mac or Linux/Unix system you could edit the file first in a command shell window with the vi editor and change sets of continuous spaces to a comma (Mathematica will take a comma separated set of data very readily). I assume this is either a mixture of numeric and alphanumeric data or just alphanumeric data - which is it. So, for example using the vi editor on a file of data that has a fixed field width of 10, and some of the fields are all blank (and possibly several in a row) and the rest of the fields only contain numbers, first need to add a comma at the end of all non-blank-but-numeric fields with the vi command :%s/\([0-9][0-9]*\)/\1,/g then need to change fields that are all blank (assuming 10 blanks in a row) with the vi command :%s/ /,/g which would change every set of 10 blanks to a comma to indicate an empty field. But then need to remove the comma at the end of all lines (otherwise will end up with one extra field on each line when you import into Mathematica) with the vi command :%s/,$// Then you could import into Mathematica and it should figure it out and put null items in the list so that things come out in the correct order. If you are on Windows, you can download a Windows version of the "vi" program called "vim" from http://www.vim.org/download.php and do the same thing. Make sure to edit on a copy of the file in case you make a mistake. Don't mean to be so confusing with the weird syntax of the vi commands, but if you are not familiar with using wild cards and saved patterns in vi, you might want to consult your closest vi expert or look at a book or the man pages. Hope that helps. If not, post a few typical lines from your data file and I will try and make another suggestion depending on it's structure. If the data is sensitive or confidential, just dummy up something that exhibits the same exact pattern (spaces, tabs, special characters like quotes or others are important so include them in your example). For example if the data file looked like this (note that some lines have blanks at the end and some don't but all have 7 10-wide-fields so want to make sure that Mathematica ends up with a lists that are each 7 elements in length) and all fields are right-justified: 111 44 66 211 44 66 311 44 66 411 44 66 77 would get changed to this: 111,,, 44,, 66, 211,,, 44,, 66, 311,,, 44,, 66, 411,,, 44,, 66, 77 and the first, second and third lines that appear to have 3 values to Mathematica using spaces as a filler in blank fields, in the comma version look to have the actual 7 values (but some of them are null or blank) that all lines do. Just thought of another thing you could do - use Excel (the spreadsheet program) to import the data, and it has the ability to parse the data assuming a fixed field length, then export as a comma separated file and then import directly into Mathematica. See the Excel manual or help for an explanation of how to import and parse data. -Bob