Re: shuffling 10^8 numbers
- To: mathgroup at smc.vnet.net
- Subject: [mg53213] Re: [mg53180] shuffling 10^8 numbers
- From: Daniel Lichtblau <danl at wolfram.com>
- Date: Thu, 30 Dec 2004 01:38:22 -0500 (EST)
- References: <200412281130.GAA26970@smc.vnet.net>
- Sender: owner-wri-mathgroup at wolfram.com
George Szpiro wrote:
> Hi,
>
> I am trying to shuffle 10^8 numbers stored in the file GG.doc in the root
> directory. (Size of GG.doc appros 360 MB)
>
> Accorrding to previous suggestions from this group I try to shuffle them
> witht he following program:
>
> GG=OpenRead["c:\GG.doc"];
> AA=ReadList[GG];
> Timing[
> OrigList=Table[AA];
> p=RandomPermutation@Length@OrigList;
> ShuffledList=OrigList[[p]];
>
>
> But the file is far too big. I can read it but then I get the following
> error message:
>
> <<No more memory available. Mathematica kernel has shut down. Try quitting
> other applications and then retry.>>
>
> No other programs are open, so I guess I am at the limit. Can anybody
> suggest a workaround? Is there a possibility to shuffle numbers without
> loading them all into memory simultaneously?
>
> NEW IDEA: I thought there might be a possibility of just reading one single
> number each time from the file GG.doc, and putting them into a randomly
> chosen slot in a new file.
>
> Any answeres greatly appreciated to:
> george at netvision.net.il
>
> Thanks,
> George
My prior reply I point out that the memory requirements for generating a
permutation and keeping both list and shuffled list can be close to the
maximum allowable, if the list consists of machine reals. I should also
point out that a purely in-place shuffle can manage 10^8 machine reals.
Below code will do this. It is slow because we do not want to copy the
original list, and hence lose the ability to use Compile.
SetAttributes[shuffleSet, HoldFirst];
shuffleSet[ll_] := Module[{n=Length[ll], rand},
Do[
rand = Random[Integer, {j,n}];
ll[[{j,rand}]] = ll[[{rand,j}]],
{j, n}];
]
len = 10^8;
ll = Table[Random[],{len}];
Timing[shuffleSet[ll]]
Out[6]= {904. Second, Null}
A drawback is that there may be a copy of the original list lying around
even though we did the shuffling in place. Use of
<<Utilities`CleanSlate`
in the AddOns:ExtraPackages directory might help to prevent this.
Daniel Lichtblau
Wolfram Research
- References:
- shuffling 10^8 numbers
- From: George Szpiro <george@netvision.net.il>
- shuffling 10^8 numbers