MathGroup Archive 2004

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: shuffling 10^8 numbers

  • To: mathgroup at smc.vnet.net
  • Subject: [mg53213] Re: [mg53180] shuffling 10^8 numbers
  • From: Daniel Lichtblau <danl at wolfram.com>
  • Date: Thu, 30 Dec 2004 01:38:22 -0500 (EST)
  • References: <200412281130.GAA26970@smc.vnet.net>
  • Sender: owner-wri-mathgroup at wolfram.com

George Szpiro wrote:
> Hi,
> 
> I am trying to shuffle 10^8 numbers stored in the file GG.doc in the root 
> directory. (Size of GG.doc appros 360 MB)
> 
> Accorrding to previous suggestions from this group I try to shuffle them 
> witht he following program:
> 
> GG=OpenRead["c:\GG.doc"];
> AA=ReadList[GG];
>   Timing[
>   OrigList=Table[AA];
>   p=RandomPermutation@Length@OrigList;
>   ShuffledList=OrigList[[p]];
> 
> 
> But the file is far too big. I can read it but then I get the following 
> error message:
> 
> <<No more memory available. Mathematica kernel has shut down. Try quitting 
> other applications and then retry.>>
> 
> No other programs are open, so I guess I am at the limit. Can anybody 
> suggest a workaround? Is there a possibility to shuffle numbers without 
> loading them all into memory simultaneously?
> 
> NEW IDEA: I thought there might be a possibility of just reading one single 
> number each time from the file GG.doc, and putting them into a randomly 
> chosen slot in a new file.
> 
> Any answeres greatly appreciated to:
> george at netvision.net.il
> 
> Thanks,
> George

My prior reply I point out that the memory requirements for generating a 
permutation and keeping both list and shuffled list can be close to the 
maximum allowable, if the list consists of machine reals. I should also 
point out that a purely in-place shuffle can manage 10^8 machine reals. 
Below code will do this. It is slow because we do not want to copy the 
original list, and hence lose the ability to use Compile.

SetAttributes[shuffleSet, HoldFirst];
shuffleSet[ll_] := Module[{n=Length[ll], rand},
   Do[
     rand = Random[Integer, {j,n}];
     ll[[{j,rand}]] = ll[[{rand,j}]],
     {j, n}];
   ]

len = 10^8;
ll = Table[Random[],{len}];

Timing[shuffleSet[ll]]
Out[6]= {904. Second, Null}

A drawback is that there may be a copy of the original list lying around 
even though we did the shuffling in place. Use of

<<Utilities`CleanSlate`

in the AddOns:ExtraPackages directory might help to prevent this.


Daniel Lichtblau
Wolfram Research





  • Prev by Date: Re: Re: including files
  • Next by Date: Re: Re: Zero testing
  • Previous by thread: Re: shuffling 10^8 numbers
  • Next by thread: Re: shuffling 10^8 numbers