MathGroup Archive 2003

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Random Number Generation : Is there a faster way of doing this?

  • To: mathgroup at smc.vnet.net
  • Subject: [mg43207] Re: Random Number Generation : Is there a faster way of doing this?
  • From: "Christos Argyropoulos M.D." <argchris at otenet.gr>
  • Date: Wed, 20 Aug 2003 22:24:53 -0400 (EDT)
  • Sender: owner-wri-mathgroup at wolfram.com

Hi,
I want to thank all the people who volunteered an answer even with the scant
information I provided about the problem. For those who are interested the
reference  about the algorithm that requires so many draws from Poisson
distributions is the following:

Stekel, D.J., Git, Y., Falciani,F. The Comparison of Gene Expression from
Multiple cDNA Libraries. Genome Research, 10:2055-2061 (2000)

Out of the answers I have received so far, I single out two propositions :
1)  stepping out the kernel in order to do the draws using the GNU
Scientific Library and
2) using DiscreteRNG from the MathStatica add-on (fortunately I own a copy).

Although 1) is definitely faster (based on timing data published on
MathGroup) when it comes to the drawing of samples per se, the problem
is that the samples formed have to be duplicated in memory during data
transfer between the external process and the kernel (so in my case
it is a no-no due to RAM limitations). I suspect that one could use a
clever data structure in the external process to store samples and free
up memory while transfering the samples in chunks, or store the data on
the disk (to avoid mathlink communication overhead and RAM limitations)
but I'd rather pass on this option for now.  Still I think it might be
worth the effort in other contexts (i.e.  Reversible Jump MCMC methods)
and with a lot of RAM (more than the 256MB I have)

2) The other solution uses MathStatica's DiscreteRNG function and hence
onecould do the draws without stepping out of the kernel. Using my
original function I could draw 100 (one hundred) samples in 351 seconds,
whereas using the implementation in MathStatica 1000 (one thousand)
samples took 67 secs and 2000 samples 94. (each sample involves drawing
a random number from 20000 different Poisson distributions)

To complete the problem, I provide the implementation using both the
standard RNG functions and Mathstatica. The two functions will implement
the methods listed in the aforementioned article; i.e. calculate the
parameters of the Poisson Distributions for each of the digital gene
expression profiles (dat_List: these are the data provided by the user)
and will generate a number of samples (size_Integer) from the null
hypothesis of no differences.

USING MATHSTATICA

Simulate[dat_List,size_Integer]:=
  Module[{F=Map[Apply[Plus,#]&,dat],totN=Nest[Apply[Plus,#]&,dat,2],
      xlogx=Map[If[#\[Equal]0,0,# Log[#]]&,dat,{2}],
      Nlib=Nest[Apply[Plus,#]&,dat,1],poisson},F/=totN;
    poisson= q^x/(E^q*x!);
     domain[poisson] = {x, 0, Infinity} && {q>0} && {Discrete};
    poissonparams=N[Map[Times[#,Nlib]&,F]];
    Transpose[
      Map[If[#\[Equal]0,Table[0,{size}],
            DiscreteRNG[size,poisson/.q\[Rule]#]]&,poissonparams,{2}],{2,3,
        1}]
    ]

USING STANDARD MATHEMATICA FUNCTIONS

Simulate[dat_List,size_Integer]:=
  Module[{F=Map[Apply[Plus,#]&,dat],totN=Nest[Apply[Plus,#]&,dat,2],
      xlogx=Map[If[#\[Equal]0,0,# Log[#]]&,dat,{2}],
      Nlib=Nest[Apply[Plus,#]&,dat,1]},F/=totN;
    poissonparams=N[Map[Times[#,Nlib]&,F]];
    fnd[x_]:=
      Map[If[#\[Equal]0,0,Random[PoissonDistribution[#]]]&,
        poissonparams,{2}];
    Array[fnd,{size}]
    ]

Thanks everyone for the suggestions,

Christos Argyropoulos



  • Prev by Date: Re: New version, new bugs
  • Next by Date: Re: Random Number Generation : Is there a faster way of doing this ?
  • Previous by thread: Re: Random Number Generation : Is there a faster way of doing this?
  • Next by thread: Re: Random Number Generation : Is there a faster way of doing this?