MathGroup Archive 2006

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Extract values and multilpicities from list

  • To: mathgroup at smc.vnet.net
  • Subject: [mg64902] Re: [mg64849] Extract values and multilpicities from list
  • From: "Carl K. Woll" <carl at woll2woll.com>
  • Date: Tue, 7 Mar 2006 06:11:52 -0500 (EST)
  • References: <200603050819.DAA09793@smc.vnet.net> <440C37AE.3050207@wolfram.com>
  • Sender: owner-wri-mathgroup at wolfram.com

Carl K. Woll wrote:
> Dr. Wolfgang Hintze wrote:
> 
>> Given a list of integers which may repeat, e.g.
>>
>> lstIn = {2,3,4,4,2,1,1,5,4}
>>
>> provide a list of different values and their respective multiplicities,
>> in the example,
>>
>> LstOut= {{1,2},{2,2},{3,1},{4,3},{5,1}}
>>
>> Who finds the shortest function doing this task in general?
>>
>> Thanks.
>>
>> Best regards,
>> Wolfgang
> 
> 
> For the special case where the integers are positive and not too large, 
> then the following approach is probably close to the fastest.
> 
> runs[d_] :=
>  SparseArray[Transpose[{d, Range[Length[d]]}] -> Table[1, {Length[d]}]] /.
>   SparseArray[_, _, _, {_, {p_, __}, _}] :>
>    Transpose[{Range[Length[p] - 1], Rest[p] - Most[p]}]
> 
> The simplest solution is probably:
> 
> runs2[d_] := {First[#], Length[#]} & /@ Split[Sort[d]]
> 
> The output of runs will include integers with 0 length runs, so we need 
> to eliminate them when comparing results:
> 
> In[22]:=
> r1=runs[data];//Timing
> r2=runs2[data];//Timing
> DeleteCases[r1,{_,0}]===r2
> 
> Out[22]=
> {0.485 Second,Null}
> 
> Out[23]=
> {0.781 Second,Null}
> 
> Out[24]=
> True
> 
> Carl Woll
> Wolfram Research
> 

I forgot to mention the possibility of using Compile. Here is a version 
with Compile that is a bit faster, and takes much less memory:

runs3[d_] := With[{max = Max[d]},
    Transpose[{
       Range[max],
       Compile[{{dat, _Integer, 1}},
                 Module[{bins = Table[0, {max}], i},
                       Do[bins[[dat[[i]]]] += 1, {i, Length[dat]}];
                       bins
                   ]][d]
    }]
];

Here are timings on a data set with 10^7 elements:

In[42]:=
r1=runs3[data];//Timing
r2=runs[data];//Timing
r1===r2

Out[42]=
{3.218 Second,Null}

Out[43]=
{4.891 Second,Null}

Out[44]=
True

Carl Woll
Wolfram Research


  • Prev by Date: Re: Compile Fourier (2)
  • Next by Date: Re: Re: Multiple application of LinearFilter
  • Previous by thread: Re: Extract values and multilpicities from list
  • Next by thread: Re: Extract values and multilpicities from list