Re: generating non-IID random sequences
- To: mathgroup at smc.vnet.net
- Subject: [mg79285] Re: generating non-IID random sequences
- From: Mark Fisher <particlefilter at gmail.com>
- Date: Mon, 23 Jul 2007 03:47:45 -0400 (EDT)
- References: <f7kegd$58u$1@smc.vnet.net><200707210835.EAA29057@smc.vnet.net>
On Jul 22, 4:34 am, Carl Woll <ca... at wolfram.com> wrote: > Yaroslav Bulatov wrote: > >The Compiled one-liner (fun2) seems to be the fastest solution. > > >I also compared it with a simple java solution, and it was about 2.7 > >times faster (19 seconds vs 7 seconds). Using Java solution through > >JLink seems to take about the same time as running stand-alone. > >Running JVM with some optimization flags (server,compileThreshold) > >took the time to 6 seconds. A simple C solution took 2 seconds, 1.5 > >seconds with gcc -O9 optimization. > > >The bottom line seems to be that efficient Mathematica solution can be > >25 times faster than inefficient one, porting to C can give 10x speed- > >up, and Java is somewhere in between. > > Here is a much faster solution: > > markov[len_, p_] := BitAnd[ > Accumulate @ Prepend[ UnitStep[RandomReal[{-p, 1 - p}, len]], > RandomInteger[]], > 1 > ] > > In[7]:= fun2[10^7, .9]; // Timing > Out[7]= {4.656,Null} > > In[8]:= markov[10^7, .9]; // Timing > Out[8]= {0.766,Null} > > A test that toggles occur about 10% of the time: > > In[10]:= res = markov[10^7, .9]; > N@Total[BitXor[Most[res], Rest[res]]]/(Length[res] - 1) > > Out[11]= 0.0998825 > > A bit of explanation. In general, an efficient Mathematica algorithm > will use vector/array operations as much as possible, so that the kernel > function does the looping through a list. In this vein, both BitAnd[_, > 1] and UnitStep[_] are vector operations relying on the listability of > BitAnd and UnitStep. > > In fun2, the function NestList is slow because it has to loop through > the list, and doesn't take advantage of vector operations. It might seem > at first glance that a NestList type of approach is unavoidable, since > your application needs to repeatedly apply a custom function to an > initial value. However, there is at least one kernel function, > Accumulate, that very quickly applys a particular function (Plus) to an > initial value (I can't think of any others that are as quick). > Accumulate is a new version 6 function, and it can be about an order of > magnitude quicker than the old equivalent FoldList[Plus, 0, list] > approach. So, using Accumulate is the key to the speed improvement. > > In order to apply Accumulate, I created a random vector having 0s with > probability p and 1s with probability 1-p. Then applying Accumulate will > create a list like: > > {0,0,0,1,1,1,2,2,2, ...} > > The even numbers correspond to one value and the odd numbers correspond > to the other value, since toggling twice brings us back to the same value. > > One more comment. If your probability p is equal to a fraction with > small integers, say p=num/den, then it is possible to speed things up > further. For example, with p=.9, we have: > > In[21]:= UnitStep[RandomReal[{-.9, .1}, 10^7]]; // Timing > UnitStep[RandomInteger[{-9, 0}, 10^7]]; // Timing > > Out[21]= {0.562,Null} > > Out[22]= {0.297,Null} > > This improvement brings the timing down to about .5 seconds, which is > almost 10 times faster than your fun2 function. > > Carl Woll > Wolfram Research > > >fun2 = Compile[{{n, _Integer}, {p, _Real}}, > > NestList[ > > If[# == 0, If[RandomReal[] < p, 0, 1], > > If[RandomReal[] < 1 - p, 0, 1]] &, RandomChoice[{0, 1}], n]]; > >Timing[fun2[4 10^7, .9]][[1]] > > >public class hmm { > > public static int[] sample(double p, int length) { > > int[] result = new int[length]; > > result[0]=(Math.random()>.5)?0:1; > > for (int i = 1; i < length; ++i) > > result[i]=(Math.random()>p)?(1-result[i-1]):result[i-1]; > > return result; > > } > > > public static void main(String[] args) { > > long start = System.currentTimeMillis(); > > sample(.9,4*10000000); > > System.out.println((System.currentTimeMillis()-start)/1000.); > > } > >} > > >#include <stdio.h> > >#include <stdlib.h> > >#include <time.h> > >#define n 40000000 > > >main() { > > int* result = malloc(4*n); > > double start = clock(); > > int i; > > result[0] = 0; > > for (i = 1; i<n;++i) > > result[i]=(rand()/((double)RAND_MAX + 1)>.9)?(1-result[i-1]): > >(result[i-1]); > > printf("%d\n",result[i-1]); > > printf("%f\n",(clock()-start)/CLOCKS_PER_SEC); > > return 0; > >} > > >On Jul 20, 12:35 am, Mark Fisher <particlefil... at gmail.com> wrote: > > >>On Jul 18, 3:12 am, Yaroslav Bulatov <yarosla... at gmail.com> wrote: > > >>>I'm looking for a fast way to sample from a Markov-1 sequence of > >>>random bits. The method below is 3600 times slower than built-in > >>>RandomInteger function, can it be made much faster? > > >>>p = 0.9; (* the probability of encountering 00 or 11 *) > >>>f = RandomChoice[{p^# (1 - p)^(1 - #), p^(1 - #) (1 - p)^#} -> {1, 0}] > >>>&; > >>>NestList[f, RandomChoice[{0, 1}], 100000] > > >>Here are 3 functions. The first is your version; the second is your > >>version compiled; the third is a slightly different version also > >>compiled. > > >>fun = Function[{n, p}, > >> NestList[ > >> RandomChoice[{p^# (1 - p)^(1 - #), p^(1 - #) (1 - p)^#} -> {1, > >> 0}] &, RandomChoice[{0, 1}], n]]; > > >>fun1 = Compile[{{n, _Integer}, {p, _Real}}, > >> NestList[ > >> RandomChoice[{p^# (1 - p)^(1 - #), p^(1 - #) (1 - p)^#} -> {1, > >> 0}] &, RandomChoice[{0, 1}], n]]; > > >>fun2 = Compile[{{n, _Integer}, {p, _Real}}, > >> NestList[ > >> If[# == 0, If[RandomReal[] < p, 0, 1], > >> If[RandomReal[] < 1 - p, 0, 1]] &, RandomChoice[{0, 1}], n]]; > > >>This gives the relative timings: > > >>First[#]/#& @ Table[Timing[i[10^5, .9]][[1]], {i, {fun, fun1, fun2}}] > > >>On my machine I get a factor of 5 speed up for fun1 and a factor of 25 > >>speed up for fun2 relative to fun. > > >>--Mark. Carl makes the clever suggestion to speed things up further using RandomInteger, but he seems to suggest this idea isn't generally applicable by referring to "small integers". A small amount of experimentation suggests it works well for arbitrary p (as long as p is a machine number). One simply uses Rationalize[p,0] to get the nearest fraction. Here's a modification of Carl's function: markov1[len_, p_] := With[{r = Rationalize[p, 0]}, BitAnd[Accumulate@ Prepend[UnitStep[ RandomInteger[{-Numerator[r], Denominator[r] - Numerator[r] - 1}, len]], RandomInteger[]], 1]] --Mark
- References:
- Re: generating non-IID random sequences
- From: Yaroslav Bulatov <yaroslavvb@gmail.com>
- Re: generating non-IID random sequences