Re: shuffling 10^8 numbers
- To: mathgroup at smc.vnet.net
- Subject: [mg53213] Re: [mg53180] shuffling 10^8 numbers
- From: Daniel Lichtblau <danl at wolfram.com>
- Date: Thu, 30 Dec 2004 01:38:22 -0500 (EST)
- References: <200412281130.GAA26970@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
George Szpiro wrote: > Hi, > > I am trying to shuffle 10^8 numbers stored in the file GG.doc in the root > directory. (Size of GG.doc appros 360 MB) > > Accorrding to previous suggestions from this group I try to shuffle them > witht he following program: > > GG=OpenRead["c:\GG.doc"]; > AA=ReadList[GG]; > Timing[ > OrigList=Table[AA]; > p=RandomPermutation@Length@OrigList; > ShuffledList=OrigList[[p]]; > > > But the file is far too big. I can read it but then I get the following > error message: > > <<No more memory available. Mathematica kernel has shut down. Try quitting > other applications and then retry.>> > > No other programs are open, so I guess I am at the limit. Can anybody > suggest a workaround? Is there a possibility to shuffle numbers without > loading them all into memory simultaneously? > > NEW IDEA: I thought there might be a possibility of just reading one single > number each time from the file GG.doc, and putting them into a randomly > chosen slot in a new file. > > Any answeres greatly appreciated to: > george at netvision.net.il > > Thanks, > George My prior reply I point out that the memory requirements for generating a permutation and keeping both list and shuffled list can be close to the maximum allowable, if the list consists of machine reals. I should also point out that a purely in-place shuffle can manage 10^8 machine reals. Below code will do this. It is slow because we do not want to copy the original list, and hence lose the ability to use Compile. SetAttributes[shuffleSet, HoldFirst]; shuffleSet[ll_] := Module[{n=Length[ll], rand}, Do[ rand = Random[Integer, {j,n}]; ll[[{j,rand}]] = ll[[{rand,j}]], {j, n}]; ] len = 10^8; ll = Table[Random[],{len}]; Timing[shuffleSet[ll]] Out[6]= {904. Second, Null} A drawback is that there may be a copy of the original list lying around even though we did the shuffling in place. Use of <<Utilities`CleanSlate` in the AddOns:ExtraPackages directory might help to prevent this. Daniel Lichtblau Wolfram Research
- References:
- shuffling 10^8 numbers
- From: George Szpiro <george@netvision.net.il>
- shuffling 10^8 numbers