Re: Re: generating non-IID random sequences
- To: mathgroup at smc.vnet.net
- Subject: [mg79262] Re: [mg79230] Re: generating non-IID random sequences
- From: Carl Woll <carlw at wolfram.com>
- Date: Sun, 22 Jul 2007 04:26:04 -0400 (EDT)
- References: <f7kegd$58u$1@smc.vnet.net><f7poja$pac$1@smc.vnet.net> <200707210835.EAA29057@smc.vnet.net>
Yaroslav Bulatov wrote: >The Compiled one-liner (fun2) seems to be the fastest solution. > >I also compared it with a simple java solution, and it was about 2.7 >times faster (19 seconds vs 7 seconds). Using Java solution through >JLink seems to take about the same time as running stand-alone. >Running JVM with some optimization flags (server,compileThreshold) >took the time to 6 seconds. A simple C solution took 2 seconds, 1.5 >seconds with gcc -O9 optimization. > >The bottom line seems to be that efficient Mathematica solution can be >25 times faster than inefficient one, porting to C can give 10x speed- >up, and Java is somewhere in between. > > Here is a much faster solution: markov[len_, p_] := BitAnd[ Accumulate @ Prepend[ UnitStep[RandomReal[{-p, 1 - p}, len]], RandomInteger[]], 1 ] In[7]:= fun2[10^7, .9]; // Timing Out[7]= {4.656,Null} In[8]:= markov[10^7, .9]; // Timing Out[8]= {0.766,Null} A test that toggles occur about 10% of the time: In[10]:= res = markov[10^7, .9]; N@Total[BitXor[Most[res], Rest[res]]]/(Length[res] - 1) Out[11]= 0.0998825 A bit of explanation. In general, an efficient Mathematica algorithm will use vector/array operations as much as possible, so that the kernel function does the looping through a list. In this vein, both BitAnd[_, 1] and UnitStep[_] are vector operations relying on the listability of BitAnd and UnitStep. In fun2, the function NestList is slow because it has to loop through the list, and doesn't take advantage of vector operations. It might seem at first glance that a NestList type of approach is unavoidable, since your application needs to repeatedly apply a custom function to an initial value. However, there is at least one kernel function, Accumulate, that very quickly applys a particular function (Plus) to an initial value (I can't think of any others that are as quick). Accumulate is a new version 6 function, and it can be about an order of magnitude quicker than the old equivalent FoldList[Plus, 0, list] approach. So, using Accumulate is the key to the speed improvement. In order to apply Accumulate, I created a random vector having 0s with probability p and 1s with probability 1-p. Then applying Accumulate will create a list like: {0,0,0,1,1,1,2,2,2, ...} The even numbers correspond to one value and the odd numbers correspond to the other value, since toggling twice brings us back to the same value. One more comment. If your probability p is equal to a fraction with small integers, say p=num/den, then it is possible to speed things up further. For example, with p=.9, we have: In[21]:= UnitStep[RandomReal[{-.9, .1}, 10^7]]; // Timing UnitStep[RandomInteger[{-9, 0}, 10^7]]; // Timing Out[21]= {0.562,Null} Out[22]= {0.297,Null} This improvement brings the timing down to about .5 seconds, which is almost 10 times faster than your fun2 function. Carl Woll Wolfram Research >fun2 = Compile[{{n, _Integer}, {p, _Real}}, > NestList[ > If[# == 0, If[RandomReal[] < p, 0, 1], > If[RandomReal[] < 1 - p, 0, 1]] &, RandomChoice[{0, 1}], n]]; >Timing[fun2[4 10^7, .9]][[1]] > >public class hmm { > public static int[] sample(double p, int length) { > int[] result = new int[length]; > result[0]=(Math.random()>.5)?0:1; > for (int i = 1; i < length; ++i) > result[i]=(Math.random()>p)?(1-result[i-1]):result[i-1]; > return result; > } > > public static void main(String[] args) { > long start = System.currentTimeMillis(); > sample(.9,4*10000000); > System.out.println((System.currentTimeMillis()-start)/1000.); > } >} > > >#include <stdio.h> >#include <stdlib.h> >#include <time.h> >#define n 40000000 > >main() { > int* result = malloc(4*n); > double start = clock(); > int i; > result[0] = 0; > for (i = 1; i<n;++i) > result[i]=(rand()/((double)RAND_MAX + 1)>.9)?(1-result[i-1]): >(result[i-1]); > printf("%d\n",result[i-1]); > printf("%f\n",(clock()-start)/CLOCKS_PER_SEC); > return 0; >} > > >On Jul 20, 12:35 am, Mark Fisher <particlefil... at gmail.com> wrote: > > >>On Jul 18, 3:12 am, Yaroslav Bulatov <yarosla... at gmail.com> wrote: >> >> >> >>>I'm looking for a fast way to sample from a Markov-1 sequence of >>>random bits. The method below is 3600 times slower than built-in >>>RandomInteger function, can it be made much faster? >>> >>> >>>p = 0.9; (* the probability of encountering 00 or 11 *) >>>f = RandomChoice[{p^# (1 - p)^(1 - #), p^(1 - #) (1 - p)^#} -> {1, 0}] >>>&; >>>NestList[f, RandomChoice[{0, 1}], 100000] >>> >>> >>Here are 3 functions. The first is your version; the second is your >>version compiled; the third is a slightly different version also >>compiled. >> >>fun = Function[{n, p}, >> NestList[ >> RandomChoice[{p^# (1 - p)^(1 - #), p^(1 - #) (1 - p)^#} -> {1, >> 0}] &, RandomChoice[{0, 1}], n]]; >> >>fun1 = Compile[{{n, _Integer}, {p, _Real}}, >> NestList[ >> RandomChoice[{p^# (1 - p)^(1 - #), p^(1 - #) (1 - p)^#} -> {1, >> 0}] &, RandomChoice[{0, 1}], n]]; >> >>fun2 = Compile[{{n, _Integer}, {p, _Real}}, >> NestList[ >> If[# == 0, If[RandomReal[] < p, 0, 1], >> If[RandomReal[] < 1 - p, 0, 1]] &, RandomChoice[{0, 1}], n]]; >> >>This gives the relative timings: >> >>First[#]/#& @ Table[Timing[i[10^5, .9]][[1]], {i, {fun, fun1, fun2}}] >> >>On my machine I get a factor of 5 speed up for fun1 and a factor of 25 >>speed up for fun2 relative to fun. >> >>--Mark. >> >> > > > >
- References:
- Re: generating non-IID random sequences
- From: Yaroslav Bulatov <yaroslavvb@gmail.com>
- Re: generating non-IID random sequences