MathGroup Archive 2005

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Complement replacement

  • To: mathgroup at smc.vnet.net
  • Subject: [mg58739] Re: Complement replacement
  • From: "Carl K. Woll" <carlw at u.washington.edu>
  • Date: Sat, 16 Jul 2005 01:03:46 -0400 (EDT)
  • Organization: University of Washington
  • References: <200507100912.FAA06500@smc.vnet.net> <datacb$mg2$1@smc.vnet.net>
  • Sender: owner-wri-mathgroup at wolfram.com

It occurred to me that SparseArray[list] needs to extract the nonzero 
information from list in order to create its sparse array internal format. 
We can use this mechanism to drop the zeros from a list:

list={1,0,3,0,4,2,3};

In[17]:=
SparseArray[list] /. SparseArray[_,_,_,x_]:>x[[3]]

Out[17]=
{1,3,4,2,3}

For large lists, we have

list=Table[Random[Integer],{10^6}];

In[19]:=
r1=DeleteCases[list,0];//Timing
r2=SparseArray[list]/.SparseArray[_,_,_,x_]->x[[3]];//Timing
r1===r2

Out[19]=
{0.265 Second,Null}

Out[20]=
{0.047 Second,Null}

Out[21]=
True

So, using SparseArrays is much quicker. Incorporating this mechanism for 
deleting zeros we have

missing[list_] := Module[{r = Range[Length[list]]},
r[[list]] = 0;
SparseArray[r] /. SparseArray[_, _, _, x_] :> x[[3]]]

Let's compare this to Complement:

pp = Table[Random[Integer, {1, 2 10^6}], {2 10^6}];

In[26]:=
r1=missing[pp];//Timing
r2=Complement[Range[Length[pp]],pp];//Timing
r1===r2

Out[26]=
{0.235 Second,Null}

Out[27]=
{1.703 Second,Null}

Out[28]=
True

This new version of missing is about 3 times faster than my old one, and 
almost an order of magnitude faster than using Complement for this size data 
set.

Carl Woll
Wolfram Research

SparseArray[tst] /. SparseArray[_,_,_,x_]:>x[[3]]
"Carl K. Woll" <carl at woll2woll.com> wrote in message 
news:datacb$mg2$1 at smc.vnet.net...
> ----- Original Message ----- 
> From: <konstantpi at mail15.com>
To: mathgroup at smc.vnet.net
> Subject: [mg58739]  Complement replacement
>
>
>> hi
>> in the list:
>> pp=Table[Random[Integer, {1, 1000}], {i, 1000}];
>> how could i know which numbers from 1 to 1000 does not exist in the
>> pp List.
>> but without using:
>> Complement[Table[i,{i,1000}],pp]
>> regards
>>
>
> Here is a possibility that is faster than Complement on my machine for 
> this
> particular problem:
>
> missing[list_] := Module[{len = Length[list], a},
> a = Range[len];
> a[[list]] = 0;
> DeleteCases[a, 0]]
>
> The above function scales as O(n), and Complement scales as O(n log n) due
> to sorting, so for large enough input my function ought to be faster. For
> example, with the data:
>
> n = 10^6;
> pp = Table[Random[Integer, {1, n}], {n}];
>
> We have the following comparison:
>
> In[26]:=
> c1=Complement[Range[Length[pp]],pp];//Timing
> c2=missing[pp];//Timing
> c1===c2
>
> Out[26]=
> {1.594 Second,Null}
>
> Out[27]=
> {0.625 Second,Null}
>
> Out[28]=
> True
>
> A further slight increase in speed ought to be possible if you compile
> DeleteCases[a,0] since a is a list of integers. For example, define
> deletezeros as:
>
> deletezeros = Compile[{{a, _Integer, 1}},
> Module[{b, len, j},
> len = Length[a];
> b = a;
> j = 0;
> Do[
> If[a[[i]] =!= 0, b[[++j]] = a[[i]]], {i, len}];
> Take[b, j]]]
>
> Then, replacing DeleteCases[a,0] with deletezeros[a] approximately doubles
> the speed.
>
> Carl Woll
> Wolfram Research
>
> 


  • Prev by Date: Re: Re: Graphical modeling of extinct lifeforms with Mathematica
  • Next by Date: Modeling and Array Problem
  • Previous by thread: Re: Complement replacement
  • Next by thread: Re: Complement replacement