MathGroup Archive 2012

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Importing large file into table and calculating takes a long time. How to improve efficiency?

  • To: mathgroup at smc.vnet.net
  • Subject: [mg126365] Re: Importing large file into table and calculating takes a long time. How to improve efficiency?
  • From: David Bailey <dave at removedbailey.co.uk>
  • Date: Thu, 3 May 2012 22:25:12 -0400 (EDT)
  • Delivered-to: l-mathgroup@mail-archive0.wolfram.com

On 01/05/2012 10:26, Gangerolv wrote:
> (First my disclaimer, I'm new to mathematica)
>
> I'm importing a file with three values, x,y,z in hexadecimal.
> Sample of the input data: {0., 24, "009d"}, {0., 28, 9}, {0., 28, 99}, {"00dc", 27, 98}, {0., 29, 95},...
>
> This set is converted to integer: {0., 36, 157}, {0., 40, 9}, {0., 40, 153}, {220, 39, 152}, {0., 41, 149},...
>
> A final vector is calculated using: g==sqrt(x^2 + y^2 + z^2) for each data set.
>
> The data is then plotted.
>
> All of this is good an works great for a small file (2000 data sets). But when I try to import and calculate a larger file (over 100k data sets), it seems to take forever. Either my methods are not efficient (use of Table), or I'm not using correct settings for importing of the data. The file is only 2Mb so I know mathematica should be able to handle it.
>
> Here's what I'm doing:
>
> ==============================================================
> dataHex ==
>   Import["C:\\Projects\\Mathematica\\test.csv"]
> points == Length[dataHex]
> dataDec == Table[
>    ToExpression["16^^"<>  #]&  /@ {
>      ToString[dataHex[[i, 1]]],
>      ToString[dataHex[[i, 2]]],
>      ToString[dataHex[[i, 3]]]},
>    {i, 1, points}]
> dataDecCompl == Table[{
>      If[dataDec[[i, 1]]>  32768, dataDec[[i, 1]] - 65536, dataDec[[i, 1]]],
>      If[dataDec[[i, 2]]>  32768, dataDec[[i, 2]] - 65536, dataDec[[i, 2]]],
>      If[dataDec[[i, 3]]>  32768, dataDec[[i, 2]] - 65536, dataDec[[i, 3]]]},
>     {i, 1, points}];
>
> dataG == MovingAverage[Table[
>      Sqrt[dataX[[i]]^2 + dataY[[i]]^2 + dataZ[[i]]^2] // N,
>      {i, 1, points - filter}],
>     3];
> plotG == ListLinePlot[{dataG}, PlotRange ->  All];
> ===========================================================================
>
First, you are writing == for assignment, and you mean = however since 
to say you have had this code working, I assume this is an artifact of 
copying it here.

I would start by adding Print statements such as Print["Step 1"]; 
between each stop of the process, so you can get an idea where the time 
is going. So add one after the Import command, then after the first 
Table command - that way you will not waste time speeding up the wrong 
part of your code.

However, I'd start by making sure your code is really doing what you 
want! For example, if you are reading in Hex values in the way you are 
doing, you will get errors - for example:

In[5]:= ToString[2b2a4]

Out[5]= "2 b2a4"

I think reading a comma separated file of hex values will be a bit 
tricky. I'd recommend using ReadList[file,String] which will give you a 
list of strings - one per line. You would then need to split each line 
using StringSplit, and then work on that.

Above all, I would say that you must test your code on tiny examples 
that you can check in some other way, because after you have run an 
averaging process on your data, you may never notice the errors!

David Bailey
http://www.dbaileyconsultancy.co.uk




  • Prev by Date: Re: new Graph function + combinatorica: various problems
  • Next by Date: Re: Wrapping NDSolve within a function
  • Previous by thread: Re: Importing large file into table and calculating takes a long time. How to improve efficiency?
  • Next by thread: Re: moving average function