MathGroup Archive 2004

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: shuffling 10^8 numbers

  • To: mathgroup at smc.vnet.net
  • Subject: [mg53190] Re: shuffling 10^8 numbers
  • From: David Bailey <dave at Remove_Thisdbailey.co.uk>
  • Date: Tue, 28 Dec 2004 23:12:06 -0500 (EST)
  • References: <cqrgpe$qgv$1@smc.vnet.net>
  • Sender: owner-wri-mathgroup at wolfram.com

George Szpiro wrote:
> Hi,
> 
> I am trying to shuffle 10^8 numbers stored in the file GG.doc in the root 
> directory. (Size of GG.doc appros 360 MB)
> 
> Accorrding to previous suggestions from this group I try to shuffle them 
> witht he following program:
> 
> GG=OpenRead["c:\GG.doc"];
> AA=ReadList[GG];
>   Timing[
>   OrigList=Table[AA];
>   p=RandomPermutation@Length@OrigList;
>   ShuffledList=OrigList[[p]];
> 
> 
> But the file is far too big. I can read it but then I get the following 
> error message:
> 
> <<No more memory available. Mathematica kernel has shut down. Try quitting 
> other applications and then retry.>>
> 
> No other programs are open, so I guess I am at the limit. Can anybody 
> suggest a workaround? Is there a possibility to shuffle numbers without 
> loading them all into memory simultaneously?
> 
> NEW IDEA: I thought there might be a possibility of just reading one single 
> number each time from the file GG.doc, and putting them into a randomly 
> chosen slot in a new file.
> 
> Any answeres greatly appreciated to:
> george at netvision.net.il
> 
> Thanks,
> George
> 
> 

I think speed may be a problem with this size of file whatever you do, 
but you could read your file one number at a time using Read (using a 
type of Number). Then, if you have version 5.1, you could use 
BinaryWrite to write the values to a number of smaller binary files. You 
could then shuffle each file in turn before combining them by repeatedly 
reading a number from a randomly chosen file and writing it to your 
final output file (presumably in text format). Since you would never 
have the whole file in memory at one time, you should not hit memory 
limits.

Before spending a great deal of effort on this, I would time a program 
that simply reads your file one number at a time and writes another 
(without shuffling). The cost of I/O is likely to dominate, so you will 
get an idea of performance if you do this. If it is too slow, you may 
have to think about C++.

Your idea would work with a binary file (where every number takes the 
same number of bytes) but you would have to ensure that you did not 
write several numbers into the same slot (and therefore leave others empty).

David Bailey
dbaileyconsultancy.co.uk



  • Prev by Date: Re: shuffling 10^8 numbers
  • Next by Date: Re: Re: Mathematica language issues
  • Previous by thread: Re: shuffling 10^8 numbers
  • Next by thread: Re : Unpredictability (was (Re: Unevaluated, Plus and Times (was Re: Mathematica language issues))