MathGroup Archive 2003

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Re: Map Question

  • To: mathgroup at smc.vnet.net
  • Subject: [mg45063] Re: [mg45048] Re: [mg45025] Map Question
  • From: Daniel Lichtblau <danl at wolfram.com>
  • Date: Mon, 15 Dec 2003 06:02:56 -0500 (EST)
  • References: <200312141122.GAA19804@smc.vnet.net>
  • Sender: owner-wri-mathgroup at wolfram.com

Ben Bongiorno wrote:
> 
> Mr. Lichtblau
> 
> Thank you for your promp response.
> 
> Does your Module change if the elements being compared (===) are
> strings?
> 
> Benedetto Bongiorno CPA, CRE
> BDO Seidman, LLP
> 700 North Pearl Street
> Suite 2000
> Dallas, TX 75201
> 214-665-0717
> bbongiorno at bdo.com
> 
> >>> Daniel Lichtblau <danl at wolfram.com> 12/13/03 01:35PM >>>
> Ben Bongiorno wrote:
> >
> > Happy Holidays,
> >
> > a = matrix variable of 22353 x 23 size
> > b = list variable of 21963 size
> > c = matrix variable of 21963 x 1 size
> >
> > My Do routine
> >
> > x=0;
> > Do[x=x+1;
> > c[[x]]=Select[a,#[[3]]===b[[x]]&],
> > {Length[b]}]
> >
> > How can I utilize Map to speed up this routine?
> >
> > Benedetto Bongiorno CPA, CRE
> > BDO Seidman, LLP
> > 700 North Pearl Street
> > Suite 2000
> > Dallas, TX 75201
> > 214-665-0717
> > bbongiorno at bdo.com
> > [...]
> 
> Actually your result is (in general) not a matrix at all. It may be
> described as a list of subsets of rows of the original matrix.
> 
> A pedestrian approach to do what you have in mind might be as follows.
> 
> cc = Table[
>   Select[aa,#[[3]]===bb[[j]]&],
>   {j,Length[bb]}]
> 
> This will be very slow because the complexity is
> O(length(aa)*length(bb)). Indeed, it is not even faster than your
> code,
> just slightly cleaner insofar as it does not require a list
> initialization. Here is an example around 200 times smaller than
> yours.
> 
> In[51]:= aa = Table[Random[Integer, 10], {1100}, {23}];
> 
> In[52]:= bb = Table[Random[Integer, 10], {2100}];
> 
> In[53]:= Timing[cc = Table[
>         Select[aa,#[[3]]===bb[[j]]&],
>         {j,Length[bb]}];]
> Out[53]= {16.55 Second, Null}
> 
> If your elements are, say, real numbers then you can use ordering to
> get
> a much faster code. First sort the matrix based on values of third
> elements in each row. Next sort the vector by values, keeping track of
> their original positions.
> 
> Now you simply walk over both sorted objects, increasing the matrix
> index when third element of a row is less than b vector element,
> increasing the vector index when the inequality is reversed, and
> stuffing a set of rows into the appropriate slots of the result while
> the two values are equal. Code to do this is below.
> 
> f[a_,b_] := Module[
>   {aaa, bbb, res, lena=Length[a], lenb=Length[bb], j=1, k=1, oldj},
>   aaa = aa[[Ordering[aa[[All,3]]]]];
>   bbb = Sort[Transpose[{bb,Range[lenb]}]];
>   res = Table[{}, {lenb}];
>   While [j<=lena && k<=lenb,
>     aj = aaa[[j]];
>     bk = bbb[[k]];
>     If [aj[[3]]<bk[[1]], j++; Continue[]
>       , (* else *)
>       If [aj[[3]]>bk[[1]], k++; Continue[]]
>       ];
>     oldj = j;
>     While [j<=lena && aaa[[j,3]]==bk[[1]], j++];
>     res[[bk[[2]]]] = aaa[[Range[oldj,j-1]]];
>     While [k<lenb && bbb[[k+1,1]]==bk[[1]],
>       k++; bk = bbb[[k]]; res[[bk[[2]]]] = aaa[[Range[oldj,j-1]]];
>       ];
>     ];
>   res
>   ]
> 
> Here is an example of the size you have in mind.
> 
> In[48]:= aa = Table[Random[Integer, 1000], {22353}, {23}];
> 
> In[49]:= bb = Table[Random[Integer, 1000], {21963}];
> 
> In[50]:= Timing[result = f[aa,bb];]
> Out[50]= {1.24 Second, Null}
> 
> Extrapolating from the first example, I would expect the pedestrian
> code
> to take about an hour on the full size example. It's still running.

Finished a few minutes after I posted this note. It took about 4140
seconds.

> Daniel Lichtblau
> Wolfram Research
> [...]


First let me note that one does not require real numbers or integers.
All that is needed is a way to test ordering as used by Sort and
Ordering. This can be done with Order. I will demonstrate with the same
example as before but the altered code should handle strings just fine.

f2[a_,b_] := Module[
  {aaa, bbb, res, lena=Length[a], lenb=Length[bb], j=1, k=1, oldj},
  aaa = aa[[Ordering[aa[[All,3]]]]];
  bbb = Sort[Transpose[{bb,Range[lenb]}]];
  res = Table[{}, {lenb}];
  While [j<=lena && k<=lenb,
    aj = aaa[[j]];
    bk = bbb[[k]];
    If [Order[aj[[3]],bk[[1]]]===1, j++; Continue[]
      , (* else *)
      If [Order[aj[[3]],bk[[1]]]===-1, k++; Continue[]]
      ];
    oldj = j;
    While [j<=lena && aaa[[j,3]]===bk[[1]], j++];
    res[[bk[[2]]]] = aaa[[Range[oldj,j-1]]];
    While [k<lenb && bbb[[k+1,1]]===bk[[1]],
      k++; bk = bbb[[k]]; res[[bk[[2]]]] = aaa[[Range[oldj,j-1]]];
      ];
    ];
  res
  ]

Next I will point out that this method has complexity n*log(n) due to
sorting, where n (measures the length of input(s). There is a method
with basically linear time performance subject to behavior of hash table
behavior. This typically means we expect linear behavior unless sizes
are huge, and moreover we expect to be hit with a substantial
multiplicative constant (so don't expect to beat the previous method
until sizes are fairly large).

The idea is to walk down the matrix, record rows in a hash table
associating them with the value of the third element. Then walk the
vector, extracting from the hash table the set of rows corresponding to
the value at each slot in that vector, in order to create the result.
Note that this approach does not involve ordering properties, hence can
be used for arbitrary element types.

f3[a_,b_] := Module[
  {myhash, res, lena=Length[a], lenb=Length[bb], j=1, k=1, oldj},
  Do [
    If [Head[myhash[aa[[j,3]]]]===myhash,
        myhash[aa[[j,3]]] = {aa[[j]]}
        , (* else *)
        myhash[aa[[j,3]]] = {myhash[aa[[j,3]]], aa[[j]]}
        ]
    , {j,lena}
    ];
  Table[
    If [Head[myhash[bb[[j]]]]===myhash,
        {}
        , (* else *)
        Partition[Flatten[myhash[bb[[j]]]], 23]
        ]
    , {j,lenb}
    ]
  ]

Below we compare the three codes on the reference example.

In[83]:= aa = Table[Random[Integer, 1000], {22353}, {23}];

In[84]:= bb = Table[Random[Integer, 1000], {21963}];

In[85]:= Timing[result = f[aa,bb];]
Out[85]= {1.17 Second, Null}

In[86]:= Timing[result2 = f2[aa,bb];]
Out[86]= {1.35 Second, Null}

In[87]:= Timing[result3 = f3[aa,bb];]
Out[87]= {3.84 Second, Null}

In[88]:= result3===result2===result
Out[88]= True


Daniel Lichtblau
Wolfram Research


  • Prev by Date: RE: Re: question Mathematica 5 "Print to the same line"
  • Next by Date: Re: Plot of x=y^2
  • Previous by thread: Re: Map Question
  • Next by thread: Mathematica 5 hangs when rendering plots with large numbers