Re: shuffling 10^8 numbers

*To*: mathgroup at smc.vnet.net*Subject*: [mg53207] Re: [mg53180] shuffling 10^8 numbers*From*: Daniel Lichtblau <danl at wolfram.com>*Date*: Tue, 28 Dec 2004 23:12:45 -0500 (EST)*References*: <200412281130.GAA26970@smc.vnet.net>*Sender*: owner-wri-mathgroup at wolfram.com

George Szpiro wrote: > Hi, > > I am trying to shuffle 10^8 numbers stored in the file GG.doc in the root > directory. (Size of GG.doc appros 360 MB) > > Accorrding to previous suggestions from this group I try to shuffle them > witht he following program: > > GG=OpenRead["c:\GG.doc"]; > AA=ReadList[GG]; > Timing[ > OrigList=Table[AA]; > p=RandomPermutation@Length@OrigList; > ShuffledList=OrigList[[p]]; > > > But the file is far too big. I can read it but then I get the following > error message: > > <<No more memory available. Mathematica kernel has shut down. Try quitting > other applications and then retry.>> > > No other programs are open, so I guess I am at the limit. Can anybody > suggest a workaround? Is there a possibility to shuffle numbers without > loading them all into memory simultaneously? > > NEW IDEA: I thought there might be a possibility of just reading one single > number each time from the file GG.doc, and putting them into a randomly > chosen slot in a new file. > > Any answeres greatly appreciated to: > george at netvision.net.il > > Thanks, > George On many systems the maximum memory Mathematica can use is about 2 Gb. One can certainly form a random permutation of size 10^8 keeping inside this memory limit. shuffleC = Compile[{{n, _Integer}}, Module[ {res = Range[n], tmp, rand}, Do[ rand = Random[Integer, {j, n}]; tmp = res[[j]]; res[[j]] = res[[rand]]; res[[rand]] = tmp, {j, 1, n}]; res ]]; In[2]:= MaxMemoryUsed[] Out[2]= 3310600 In[3]:= Timing[shuf8 = shuffleC[10^8];] Out[3]= {114.35 Second, Null} In[4]:= MaxMemoryUsed[] Out[4]= 403282424 As expected the permutation takes around 4*10^8 bytes. Now let's form an array of 10^8 random machine reals. In[5]:= ll = Table[Random[],{10^8}]; Not surprisingly this takes up another 8*10^8 bytes. In[6]:= MaxMemoryUsed[] Out[6]= 1203284792 In forming the permuted array we will require another 8*10^8 bytes, putting us up against that limit. It may be the case that you are on a system that can use more memory in Mathematica, or that your 10^8 elements do not take up as much storage as ll above (e.g. if there are many repeats, or if it is an array of machine integers). All the same it appears that you are likely to be near a memory limitation if, say, there is another large list lurking somewhere. Daniel Lichtblau Wolfram Research

**References**:**shuffling 10^8 numbers***From:*George Szpiro <george@netvision.net.il>