MathGroup Archive: July 2007 [00882]

[Date Index] [Thread Index] [Author Index]
Re: generating non-IID random sequences
To: mathgroup at smc.vnet.net
Subject: [mg79285] Re: generating non-IID random sequences
From: Mark Fisher <particlefilter at gmail.com>
Date: Mon, 23 Jul 2007 03:47:45 -0400 (EDT)
References: <f7kegd$58u$1@smc.vnet.net><200707210835.EAA29057@smc.vnet.net>
On Jul 22, 4:34 am, Carl Woll <ca... at wolfram.com> wrote:
> Yaroslav Bulatov wrote:
> >The Compiled one-liner (fun2) seems to be the fastest solution.
>
> >I also compared it with a simple java solution, and it was about 2.7
> >times faster (19 seconds vs 7 seconds). Using Java solution through
> >JLink seems to take about the same time as running stand-alone.
> >Running JVM with some optimization flags (server,compileThreshold)
> >took the time to 6 seconds. A simple C solution took 2 seconds, 1.5
> >seconds with gcc -O9 optimization.
>
> >The bottom line seems to be that efficient Mathematica solution can be
> >25 times faster than inefficient one, porting to C can give 10x speed-
> >up, and Java is somewhere in between.
>
> Here is a much faster solution:
>
> markov[len_, p_] := BitAnd[
>   Accumulate @ Prepend[ UnitStep[RandomReal[{-p, 1 - p}, len]],
> RandomInteger[]],
>   1
> ]
>
> In[7]:= fun2[10^7, .9]; // Timing
> Out[7]= {4.656,Null}
>
> In[8]:= markov[10^7, .9]; // Timing
> Out[8]= {0.766,Null}
>
> A test that toggles occur about 10% of the time:
>
> In[10]:= res = markov[10^7, .9];
> N@Total[BitXor[Most[res], Rest[res]]]/(Length[res] - 1)
>
> Out[11]= 0.0998825
>
> A bit of explanation. In general, an efficient Mathematica algorithm
> will use vector/array operations as much as possible, so that the kernel
> function does the looping through a list. In this vein, both BitAnd[_,
> 1] and UnitStep[_] are vector operations relying on the listability of
> BitAnd and UnitStep.
>
> In fun2, the function NestList is slow because it has to loop through
> the list, and doesn't take advantage of vector operations. It might seem
> at first glance that a NestList type of approach is unavoidable, since
> your application needs to repeatedly apply a custom function to an
> initial value. However, there is at least one kernel function,
> Accumulate, that very quickly applys a particular function (Plus) to an
> initial value (I can't think of any others that are as quick).
> Accumulate is a new version 6 function, and it can be about an order of
> magnitude quicker than the old equivalent FoldList[Plus, 0, list]
> approach. So, using Accumulate is the key to the speed improvement.
>
> In order to apply Accumulate, I created a random vector having 0s with
> probability p and 1s with probability 1-p. Then applying Accumulate will
> create a list like:
>
> {0,0,0,1,1,1,2,2,2, ...}
>
> The even numbers correspond to one value and the odd numbers correspond
> to the other value, since toggling twice brings us back to the same value.
>
> One more comment. If your probability p is equal to a fraction with
> small integers, say p=num/den, then it is possible to speed things up
> further. For example, with p=.9, we have:
>
> In[21]:= UnitStep[RandomReal[{-.9, .1}, 10^7]]; // Timing
> UnitStep[RandomInteger[{-9, 0}, 10^7]]; // Timing
>
> Out[21]= {0.562,Null}
>
> Out[22]= {0.297,Null}
>
> This improvement brings the timing down to about .5 seconds, which is
> almost 10 times faster than your fun2 function.
>
> Carl Woll
> Wolfram Research
>
> >fun2 = Compile[{{n, _Integer}, {p, _Real}},
> >    NestList[
> >     If[# == 0, If[RandomReal[] < p, 0, 1],
> >       If[RandomReal[] < 1 - p, 0, 1]] &, RandomChoice[{0, 1}], n]];
> >Timing[fun2[4 10^7, .9]][[1]]
>
> >public class hmm {
> >    public static int[] sample(double p, int length) {
> >    int[] result = new int[length];
> >    result[0]=(Math.random()>.5)?0:1;
> >    for (int i = 1; i < length; ++i)
> >        result[i]=(Math.random()>p)?(1-result[i-1]):result[i-1];
> >    return result;
> >    }
>
> >    public static void main(String[] args) {
> >        long start = System.currentTimeMillis();
> >        sample(.9,4*10000000);
> >        System.out.println((System.currentTimeMillis()-start)/1000.);
> >    }
> >}
>
> >#include <stdio.h>
> >#include <stdlib.h>
> >#include <time.h>
> >#define n 40000000
>
> >main() {
> >  int* result = malloc(4*n);
> >  double start = clock();
> >  int i;
> >  result[0] = 0;
> >  for (i = 1; i<n;++i)
> >    result[i]=(rand()/((double)RAND_MAX + 1)>.9)?(1-result[i-1]):
> >(result[i-1]);
> >  printf("%d\n",result[i-1]);
> >  printf("%f\n",(clock()-start)/CLOCKS_PER_SEC);
> >  return 0;
> >}
>
> >On Jul 20, 12:35 am, Mark Fisher <particlefil... at gmail.com> wrote:
>
> >>On Jul 18, 3:12 am, Yaroslav Bulatov <yarosla... at gmail.com> wrote:
>
> >>>I'm looking for a fast way to sample from a Markov-1 sequence of
> >>>random bits. The method below is 3600 times slower than built-in
> >>>RandomInteger function, can it be made much faster?
>
> >>>p = 0.9; (* the probability of encountering 00 or 11 *)
> >>>f = RandomChoice[{p^# (1 - p)^(1 - #), p^(1 - #) (1 - p)^#} -> {1, 0}]
> >>>&;
> >>>NestList[f, RandomChoice[{0, 1}], 100000]
>
> >>Here are 3 functions. The first is your version; the second is your
> >>version compiled; the third is a slightly different version also
> >>compiled.
>
> >>fun = Function[{n, p},
> >>   NestList[
> >>    RandomChoice[{p^# (1 - p)^(1 - #), p^(1 - #) (1 - p)^#} -> {1,
> >>        0}] &, RandomChoice[{0, 1}], n]];
>
> >>fun1 = Compile[{{n, _Integer}, {p, _Real}},
> >>   NestList[
> >>    RandomChoice[{p^# (1 - p)^(1 - #), p^(1 - #) (1 - p)^#} -> {1,
> >>        0}] &, RandomChoice[{0, 1}], n]];
>
> >>fun2 = Compile[{{n, _Integer}, {p, _Real}},
> >>   NestList[
> >>    If[# == 0, If[RandomReal[] < p, 0, 1],
> >>      If[RandomReal[] < 1 - p, 0, 1]] &, RandomChoice[{0, 1}], n]];
>
> >>This gives the relative timings:
>
> >>First[#]/#& @ Table[Timing[i[10^5, .9]][[1]], {i, {fun, fun1, fun2}}]
>
> >>On my machine I get a factor of 5 speed up for fun1 and a factor of 25
> >>speed up for fun2 relative to fun.
>
> >>--Mark.

Carl makes the clever suggestion to speed things up further using
RandomInteger, but he seems to suggest this idea isn't generally
applicable by referring to "small integers". A small amount of
experimentation suggests it works well for arbitrary p (as long as p
is a machine number). One simply uses Rationalize[p,0] to get the
nearest fraction. Here's a modification of Carl's function:

markov1[len_, p_] :=
 With[{r = Rationalize[p, 0]},
  BitAnd[Accumulate@
    Prepend[UnitStep[
      RandomInteger[{-Numerator[r],
        Denominator[r] - Numerator[r] - 1}, len]], RandomInteger[]],
   1]]

--Mark
References:
- Re: generating non-IID random sequences
  - From: Yaroslav Bulatov <yaroslavvb@gmail.com>
Prev by Date: Re: Re: Wolfram Workbench 1.1 now available
Next by Date: NDSolve plot problem
Previous by thread: Re: Re: generating non-IID random sequences
Next by thread: two integrals