MathGroup Archive 2003

[Date Index] [Thread Index] [Author Index]

Search the Archive

RE: Finding the closest number from a list

  • To: mathgroup at smc.vnet.net
  • Subject: [mg39531] RE: [mg39510] Finding the closest number from a list
  • From: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>
  • Date: Fri, 21 Feb 2003 04:11:01 -0500 (EST)
  • Sender: owner-wri-mathgroup at wolfram.com

>-----Original Message-----
>From: jrome at mail.com [mailto:jrome at mail.com]
To: mathgroup at smc.vnet.net
>Sent: Thursday, February 20, 2003 11:14 AM
>To: mathgroup at smc.vnet.net
>Subject: [mg39531] [mg39510] Finding the closest number from a list
>
[...]

>I have a seemingly simple problem. I want to find the element in a
>list that is closest to a number I specify. In my case, I have a list
>of about 30,000 numbers (y[i]). I want to find the closest match for
>each of about 10,000 other numbers (call them x[i]). Obviously, speed
>is important. I've sorted the large list, and right now I'm going
>through each y[i] from lowest to highest and testing it to see if x[i]
>is less than that value. This takes about .1 seconds for each x[i].
>
>I'm wondering if anyone has had a similar problem, and if there is a
>better function built-in to Mathematica. Alternatetively, I could
>build my own. I've just recently realized that I could also reduce the
>search time considerably if I sort the x[i] list as well, and only
>start my search from where I last left off.  Any ideas on which
>approach would be more efficient? Thanks.
>

Effectively this is a binning problem, delt with quite often in this list.
(So search the archive.)

What you didn't specify is the form of your output you desire. I assume it
should be a list of closest elements of list y, corresponding in order to
sorted list x. (You then easily may generate a list of replacement rules, or
just eat up the result for further processing, by Scan, Map or the like.)

The idea you communicated to simultaneously scan sorted y and sorted x lists
is just the sort-merge approach. That should be among the best algorithms if
compiled.

Another idea would be to sort y and then make a binary search for each
element of x to get at the two closest element of y, and the check. 

Here I'd like to communicate another idea relies on list processing, here
step by step:

In[1]:= y = Table[Random[], {30}]
Out[1]=
{0.267278, 0.99553, 0.201683, 0.0341569, 0.914983, 0.981118, 0.421376, \
0.0633918, 0.426651, 0.433505, 0.963199, 0.616847, 0.865283, 0.926424, \
0.151319, 0.734706, 0.278981, 0.247546, 0.586438, 0.0635677, 0.314219, \
0.492426, 0.800959, 0.288232, 0.0469408, 0.496896, 0.599276, 0.254075, \
0.131958, 0.515779}

In[2]:= x = Table[Random[], {10}]
Out[2]=
{0.177901, 0.190683, 0.705306, 0.0822738, 0.214701, 0.573836, 0.840023, \
0.15585, 0.0633828, 0.83913}

The test data. (I'm sorry to include the outputs, but such you may exactly
reproduce the steps.)

In[51]:= xx = Sort[x];   (* sorted x, to be used later *)

In[52]:=
zz = Join[{-Infinity}, Sort[Join[y, x + I]], {Infinity}]

Both lists y and x sorted together, the x are marked by complex I, such as
to lokate there positions in this combined sorted list. -Infinity and
Infinity are attached to the margins to avoid problems when any element of x
is smaller (or greater) than each element of y.

We search for the positions of marked x elements:

In[53]:= xX = Flatten[Position[jj, HoldPattern[Complex[_, _]]]]
Out[53]=
{4, 7, 10, 11, 12, 14, 27, 31, 34, 35}

In[54]:= lowX = (c = 0; 
    FoldList[If[#2 - #1 === ++c, #1, c = 0; #2] &, -Infinity, xX - 1] //
Rest)
Out[54]=
{3, 6, 9, 9, 9, 13, 26, 30, 33, 33}

These are the corresponding positions of the next smaller element of y. You
see double occurances when x elements came out adjacent in the combined
list.

In[55]:= highX = (c = 0; 
    FoldList[If[#2 - #1 === --c, #1, c = 0; #2] &, Infinity, 
          Reverse[xX] + 1] // Rest // Reverse)
Out[55]=
{5, 8, 13, 13, 13, 15, 28, 32, 36, 36}

Same for the next greater elements.

In[56]:= low = zz[[lowX]];
In[57]:= high = zz[[highX]];


Now we take the next nearest of both candidates:

In[58]:=
MapThread[If[#3 - #2 > #2 - #1, #1, #3] &, {low, xx, high}]
Out[58]=
{0.0633918, 0.0635677, 0.151319, 0.201683, 0.201683, 0.201683, 0.586438, \
 0.734706, 0.865283, 0.865283}


put together:

closestMatchP[x_List, y_List] :=
  Module[{xx = Sort[x], 
      zz = Join[{-Infinity}, Sort[Join[y, x + I]], {Infinity}], xX, lowX, 
      highX, c},
    xX = Flatten[Position[zz, HoldPattern[Complex[_, _]]]];
    lowX = (c = 0; 
        FoldList[If[#2 - #1 === ++c, #1, c = 0; #2] &, -Infinity, xX - 1] //

          Rest);
    highX = (c = 0; 
        FoldList[If[#2 - #1 === --c, #1, c = 0; #2] &, Infinity, 
              Reverse[xX] + 1] // Rest // Reverse);
    MapThread[If[#3 - #2 > #2 - #1, #1, #3] &, {zz[[lowX]], xx,
zz[[highX]]}]
    ]


In[73]:= closestMatchP[x, y]
Out[73]=
{0.0633918, 0.0635677, 0.151319, 0.201683, 0.201683, 0.201683, 0.586438, \
 0.734706, 0.865283, 0.865283}

In[74]:= closestMatchP[x, y + .5]
Out[74]=
{0.534157, 0.534157, 0.534157, 0.534157, 0.534157, 0.534157, 0.563568, \
 0.701683, 0.814219, 0.814219}

In[75]:= closestMatchP[x, y - .5]
Out[75]=
{0.086438, 0.086438, 0.116847, 0.234706, 0.234706, 0.234706, 0.49553, \
 0.49553, 0.49553, 0.49553}



Performance test:

In[45]:= x2 = Table[Random[], {10000}];
In[46]:= y2 = Table[Random[], {30000}];
In[50]:= closestMatchP[x2, y2]; // Timing
Out[50]= {2.564 Second, Null}

Not too bad, possibly not best; but I haven't got the time to check the
alternatives sketched above.

--
Hartmut Wolf



  • Prev by Date: Re: Re: Showing thick lines - a problem?
  • Next by Date: Re: On Packages and Buttons
  • Previous by thread: Re: Re: Finding the closest number from a list
  • Next by thread: Re: Finding the closest number from a list