Re: Optimizing a simple sampling function for speed
- To: mathgroup at smc.vnet.net
- Subject: [mg25387] Re: [mg25343] Optimizing a simple sampling function for speed
- From: "Rosa Ma. Seco" <ceie at prodigy.net.mx>
- Date: Fri, 29 Sep 2000 01:06:18 -0400 (EDT)
- Sender: owner-wri-mathgroup at wolfram.com
It seems to me that your dist has only 100 elements? In[1]:= dist = Table[Random[Integer, {1, 100}], {100}]; In[2]:= Length[dist] Out[2]= 100 Anyway, one possibility for sampling without replacement is using the function RandomKSubset from the add-on DiscreteMath`Combinatorica`: In[3]:= <<DiscreteMath`Combinatorica` Take dist2 now with 5000 elements: In[4]:= dist2= Table[Random[Integer,{1,10000}],{5000}]; Take a sample of 1000 without replacement: In[5]:= RandomKSubset[dist2,1000];//Timing Out[5]= {1.38 Second,Null} I'm sure this can be improved upon, but I'd like to have a clearer picture of the problem. Tomas Garza Mexico City Kimberly N. Russell [mailto:norris at cerc.columbia.edu] wrote: > In a lot of my work I have need for a function that draws a > sample of size n > without replacement from a set of objects in bins, where the > distribution is > given as a simple list. This function is used in simulations, so speed and > efficiency are important. Here is my attempt at doing this in Mathematica. > > First, define a function that removes one number at random from a list > without replacement. > > In[1]:= > removeone[l_] := Module[{chosenobs, chosencat}, > chosenobs = Random[]*Fold[Plus, 0, l]; > chosencat = > Extract[Position[Drop[FoldList[Plus, 0, l], 1], > x_ /; x ? chosenobs], {1, 1}]; > ReplacePart[l, l[[chosencat]] - 1, chosencat] > ] > > Nesting this allows a sample of size n to be removed, and subtracting that > from the original gives you the sample itself. > > In[2]:= > randsampcat[l_, n_] := l - Nest[removeone, l, n] > > So, create a distribution with approx 5000 objects in 1000 bins... > > In[10]:= > dist = Table[Random[Integer, {1, 100}], {100}]; > > ...and sample 1000 of them. > > In[9]:= > randsampcat[dist, 1000]; // Timing > > Out[9]= > {3.96667 Second, Null} > > I am sure that this can be improved upon. Can any of you > Mathematica wizards > show me how? I tried compiling removeone, but got the following > errors which > I don't really understand.