MathGroup Archive 2011

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: unable to import csv-Data

  • To: mathgroup at smc.vnet.net
  • Subject: [mg119067] Re: unable to import csv-Data
  • From: "Hans Michel" <hmichel at cox.net>
  • Date: Sat, 21 May 2011 06:45:55 -0400 (EDT)

Andre:

I believe the reason for the slower speed is the DateStringFormat option. If
you have time remove it and see if there is any speed gains (it will still
be slow).
The date string format option is being applied to every entry in the larger
file. That was the reason I moved that format to the end, for easy removal.
Your dates are already near ISO8601 format so they can be well ordered with
simple a removal of the "-". Also the EmptyField should be used to transform
empty field to "Null".

I would place such a large dataset in a database. Mathematica can connect
with Access database or HSQL. The import of Excel from flat file works in a
similar fashion to Access.


-----Original Message-----
From: Andre Koppel [mailto:akoppel at akso.de]
Sent: Friday, May 20, 2011 6:19 AM
To: mathgroup at smc.vnet.net
Subject: [mg119067] [mg119056] Re: unable to import csv-Data

Hi Hans,
I have tried your Import-Parameters. With my complete dataset it didn't
work.
The result was a strange mix of cells.
Currently I am reading the file completely doing the convertion manually
by using resgular expressions.
Even if the csv-import would work, it's horribly slow (more than 1000
Seconds with you Import-Parameters).
It would be nice, if there would exist a csv-Import-Function where there
could be given a list that
describes the contens of the columns. In such case Import should run
much more faster producing better results.
Thank you for your suggestion.
Andre

Am 19.05.2011 15:46, schrieb Hans Michel:
> Andre:
> The following does work
>
> Import["D:\\akk\\test.csv", "Table", {"FieldSeparators" ->  ";",
>    "CharacterEncoding" ->  "ASCII", "HeaderLines" ->  1,
>    "EmptyField" ->  "", "RepeatedSeparators" ->  False,
>    "DateStringFormat" ->  {"Year", "-", "Month", "-", "Day"}}]
>
> See the Options for "Table" import
>
> ref/format/Table
>
> Hans
>
> -----Original Message-----
> From: Andre Koppel [mailto:akoppel at akso.de]
> Sent: Wednesday, May 18, 2011 6:18 AM
> To: mathgroup at smc.vnet.net
> Subject: [mg118980] unable to import csv-Data
>
> Hello to all,
>
> I am trying to import some data from a csv-file. But I am absolutely
> unable to get any usefull result.
> I have tried several options to do formating during input, but in every
> case Mathematica 8 puts several csv-columns
> into one result-column.
> Because the csv-data contains germany encoding, I have tried several
> conversion options, but nothing helps.
> Here is a snapshot of the csv-data (one headline two datalines):
> --------------------- cut here ------------------------
>
ID;KONTO_NR;KONTO_BEZ;BELEG_DAT;BELEG_NR;GKTO_NR;GKTO_BEZ;BU_TEXT;SOLL;HABEN
>
;Buchsaldo;WAEHRUNG;Faelligkeit;Anfangsbestand;Ausgeblendet;Changed;InsoBase
> User;BuJahr
> 1;;;;;;;;7807477,41;6986382,79;,00;;;False;False;2010-11-01
> 10:24:09.997;KDLB\Conrad;
> 2;D_60004;Jeske, Norbert 23966 Hof
> Triwalk;2008-01-01;;S_09008;Vortrag;EB-Werte durch AIS TaxAudit
> berechnet und erstellt;387,37;,00;387,37;EUR;;True;True;2010-11-01
> 10:24:09.997;KDLB\Conrad;
> --------------------- cut here ------------------------
> I have tried the following import-command (and several versions of it),
> but did not get useful import-result:
> imp = Import["test.csv", "Table", "FieldSeparators" ->   ";",
>      "DateStringFormat" ->   { "Year", "-", "Month", "-", "Day"},
>      "CharacterEncoding" ->   "ASCII", "HeaderLines" ->   1] ;
>
> For me it looks like the CSV-Importer is unable to detect NULL-Values
> (;;)?!?
>
> By the way reading the data into excel, writing a resulting xls-file and
> importing the xls-file into mathematica works out of the box,
> but I can't go this way because there are more than 200000 datalines,
> and Excel did not support such a great amount and Mathematica
> was unable to import Excel-2010-Formated xlsx-Data (ods didn't works
> also because of the great amount of data).
>
> Any help would be highly appreciated
> Kind regards
> Andre
>


--
Andre Koppel Software GmbH
Prinz-Handjery-Str. 38
14167 Berlin
Tel.: (+4930) 810 09 190
Fax: (+4930) 326 01 046
www.invep.de
www.akso.de

Eingetragen beim Amtsgericht
Berlin Charlottenburg HRB92600
Gesch=E4ftsf=FChrer Andre Koppel



  • Prev by Date: Re: unable to import csv-Data
  • Next by Date: Re: Convert Matrix to Sparse Matrix
  • Previous by thread: Re: unable to import csv-Data
  • Next by thread: Re: unable to import csv-Data