Re: Random Number Generation : Is there a faster way of doing this?
- To: mathgroup at smc.vnet.net
- Subject: [mg43207] Re: Random Number Generation : Is there a faster way of doing this?
- From: "Christos Argyropoulos M.D." <argchris at otenet.gr>
- Date: Wed, 20 Aug 2003 22:24:53 -0400 (EDT)
- Sender: owner-wri-mathgroup at wolfram.com
Hi, I want to thank all the people who volunteered an answer even with the scant information I provided about the problem. For those who are interested the reference about the algorithm that requires so many draws from Poisson distributions is the following: Stekel, D.J., Git, Y., Falciani,F. The Comparison of Gene Expression from Multiple cDNA Libraries. Genome Research, 10:2055-2061 (2000) Out of the answers I have received so far, I single out two propositions : 1) stepping out the kernel in order to do the draws using the GNU Scientific Library and 2) using DiscreteRNG from the MathStatica add-on (fortunately I own a copy). Although 1) is definitely faster (based on timing data published on MathGroup) when it comes to the drawing of samples per se, the problem is that the samples formed have to be duplicated in memory during data transfer between the external process and the kernel (so in my case it is a no-no due to RAM limitations). I suspect that one could use a clever data structure in the external process to store samples and free up memory while transfering the samples in chunks, or store the data on the disk (to avoid mathlink communication overhead and RAM limitations) but I'd rather pass on this option for now. Still I think it might be worth the effort in other contexts (i.e. Reversible Jump MCMC methods) and with a lot of RAM (more than the 256MB I have) 2) The other solution uses MathStatica's DiscreteRNG function and hence onecould do the draws without stepping out of the kernel. Using my original function I could draw 100 (one hundred) samples in 351 seconds, whereas using the implementation in MathStatica 1000 (one thousand) samples took 67 secs and 2000 samples 94. (each sample involves drawing a random number from 20000 different Poisson distributions) To complete the problem, I provide the implementation using both the standard RNG functions and Mathstatica. The two functions will implement the methods listed in the aforementioned article; i.e. calculate the parameters of the Poisson Distributions for each of the digital gene expression profiles (dat_List: these are the data provided by the user) and will generate a number of samples (size_Integer) from the null hypothesis of no differences. USING MATHSTATICA Simulate[dat_List,size_Integer]:= Module[{F=Map[Apply[Plus,#]&,dat],totN=Nest[Apply[Plus,#]&,dat,2], xlogx=Map[If[#\[Equal]0,0,# Log[#]]&,dat,{2}], Nlib=Nest[Apply[Plus,#]&,dat,1],poisson},F/=totN; poisson= q^x/(E^q*x!); domain[poisson] = {x, 0, Infinity} && {q>0} && {Discrete}; poissonparams=N[Map[Times[#,Nlib]&,F]]; Transpose[ Map[If[#\[Equal]0,Table[0,{size}], DiscreteRNG[size,poisson/.q\[Rule]#]]&,poissonparams,{2}],{2,3, 1}] ] USING STANDARD MATHEMATICA FUNCTIONS Simulate[dat_List,size_Integer]:= Module[{F=Map[Apply[Plus,#]&,dat],totN=Nest[Apply[Plus,#]&,dat,2], xlogx=Map[If[#\[Equal]0,0,# Log[#]]&,dat,{2}], Nlib=Nest[Apply[Plus,#]&,dat,1]},F/=totN; poissonparams=N[Map[Times[#,Nlib]&,F]]; fnd[x_]:= Map[If[#\[Equal]0,0,Random[PoissonDistribution[#]]]&, poissonparams,{2}]; Array[fnd,{size}] ] Thanks everyone for the suggestions, Christos Argyropoulos