Re: Complement replacement

*To*: mathgroup at smc.vnet.net*Subject*: [mg58739] Re: Complement replacement*From*: "Carl K. Woll" <carlw at u.washington.edu>*Date*: Sat, 16 Jul 2005 01:03:46 -0400 (EDT)*Organization*: University of Washington*References*: <200507100912.FAA06500@smc.vnet.net> <datacb$mg2$1@smc.vnet.net>*Sender*: owner-wri-mathgroup at wolfram.com

It occurred to me that SparseArray[list] needs to extract the nonzero information from list in order to create its sparse array internal format. We can use this mechanism to drop the zeros from a list: list={1,0,3,0,4,2,3}; In[17]:= SparseArray[list] /. SparseArray[_,_,_,x_]:>x[[3]] Out[17]= {1,3,4,2,3} For large lists, we have list=Table[Random[Integer],{10^6}]; In[19]:= r1=DeleteCases[list,0];//Timing r2=SparseArray[list]/.SparseArray[_,_,_,x_]->x[[3]];//Timing r1===r2 Out[19]= {0.265 Second,Null} Out[20]= {0.047 Second,Null} Out[21]= True So, using SparseArrays is much quicker. Incorporating this mechanism for deleting zeros we have missing[list_] := Module[{r = Range[Length[list]]}, r[[list]] = 0; SparseArray[r] /. SparseArray[_, _, _, x_] :> x[[3]]] Let's compare this to Complement: pp = Table[Random[Integer, {1, 2 10^6}], {2 10^6}]; In[26]:= r1=missing[pp];//Timing r2=Complement[Range[Length[pp]],pp];//Timing r1===r2 Out[26]= {0.235 Second,Null} Out[27]= {1.703 Second,Null} Out[28]= True This new version of missing is about 3 times faster than my old one, and almost an order of magnitude faster than using Complement for this size data set. Carl Woll Wolfram Research SparseArray[tst] /. SparseArray[_,_,_,x_]:>x[[3]] "Carl K. Woll" <carl at woll2woll.com> wrote in message news:datacb$mg2$1 at smc.vnet.net... > ----- Original Message ----- > From: <konstantpi at mail15.com> To: mathgroup at smc.vnet.net > Subject: [mg58739] Complement replacement > > >> hi >> in the list: >> pp=Table[Random[Integer, {1, 1000}], {i, 1000}]; >> how could i know which numbers from 1 to 1000 does not exist in the >> pp List. >> but without using: >> Complement[Table[i,{i,1000}],pp] >> regards >> > > Here is a possibility that is faster than Complement on my machine for > this > particular problem: > > missing[list_] := Module[{len = Length[list], a}, > a = Range[len]; > a[[list]] = 0; > DeleteCases[a, 0]] > > The above function scales as O(n), and Complement scales as O(n log n) due > to sorting, so for large enough input my function ought to be faster. For > example, with the data: > > n = 10^6; > pp = Table[Random[Integer, {1, n}], {n}]; > > We have the following comparison: > > In[26]:= > c1=Complement[Range[Length[pp]],pp];//Timing > c2=missing[pp];//Timing > c1===c2 > > Out[26]= > {1.594 Second,Null} > > Out[27]= > {0.625 Second,Null} > > Out[28]= > True > > A further slight increase in speed ought to be possible if you compile > DeleteCases[a,0] since a is a list of integers. For example, define > deletezeros as: > > deletezeros = Compile[{{a, _Integer, 1}}, > Module[{b, len, j}, > len = Length[a]; > b = a; > j = 0; > Do[ > If[a[[i]] =!= 0, b[[++j]] = a[[i]]], {i, len}]; > Take[b, j]]] > > Then, replacing DeleteCases[a,0] with deletezeros[a] approximately doubles > the speed. > > Carl Woll > Wolfram Research > >

**References**:**Complement replacement***From:*konstantpi@mail15.com