[Date Index]
[Thread Index]
[Author Index]
RE: Finding the closest number from a list
*To*: mathgroup at smc.vnet.net
*Subject*: [mg39531] RE: [mg39510] Finding the closest number from a list
*From*: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>
*Date*: Fri, 21 Feb 2003 04:11:01 -0500 (EST)
*Sender*: owner-wri-mathgroup at wolfram.com
>-----Original Message-----
>From: jrome at mail.com [mailto:jrome at mail.com]
To: mathgroup at smc.vnet.net
>Sent: Thursday, February 20, 2003 11:14 AM
>To: mathgroup at smc.vnet.net
>Subject: [mg39531] [mg39510] Finding the closest number from a list
>
[...]
>I have a seemingly simple problem. I want to find the element in a
>list that is closest to a number I specify. In my case, I have a list
>of about 30,000 numbers (y[i]). I want to find the closest match for
>each of about 10,000 other numbers (call them x[i]). Obviously, speed
>is important. I've sorted the large list, and right now I'm going
>through each y[i] from lowest to highest and testing it to see if x[i]
>is less than that value. This takes about .1 seconds for each x[i].
>
>I'm wondering if anyone has had a similar problem, and if there is a
>better function built-in to Mathematica. Alternatetively, I could
>build my own. I've just recently realized that I could also reduce the
>search time considerably if I sort the x[i] list as well, and only
>start my search from where I last left off. Any ideas on which
>approach would be more efficient? Thanks.
>
Effectively this is a binning problem, delt with quite often in this list.
(So search the archive.)
What you didn't specify is the form of your output you desire. I assume it
should be a list of closest elements of list y, corresponding in order to
sorted list x. (You then easily may generate a list of replacement rules, or
just eat up the result for further processing, by Scan, Map or the like.)
The idea you communicated to simultaneously scan sorted y and sorted x lists
is just the sort-merge approach. That should be among the best algorithms if
compiled.
Another idea would be to sort y and then make a binary search for each
element of x to get at the two closest element of y, and the check.
Here I'd like to communicate another idea relies on list processing, here
step by step:
In[1]:= y = Table[Random[], {30}]
Out[1]=
{0.267278, 0.99553, 0.201683, 0.0341569, 0.914983, 0.981118, 0.421376, \
0.0633918, 0.426651, 0.433505, 0.963199, 0.616847, 0.865283, 0.926424, \
0.151319, 0.734706, 0.278981, 0.247546, 0.586438, 0.0635677, 0.314219, \
0.492426, 0.800959, 0.288232, 0.0469408, 0.496896, 0.599276, 0.254075, \
0.131958, 0.515779}
In[2]:= x = Table[Random[], {10}]
Out[2]=
{0.177901, 0.190683, 0.705306, 0.0822738, 0.214701, 0.573836, 0.840023, \
0.15585, 0.0633828, 0.83913}
The test data. (I'm sorry to include the outputs, but such you may exactly
reproduce the steps.)
In[51]:= xx = Sort[x]; (* sorted x, to be used later *)
In[52]:=
zz = Join[{-Infinity}, Sort[Join[y, x + I]], {Infinity}]
Both lists y and x sorted together, the x are marked by complex I, such as
to lokate there positions in this combined sorted list. -Infinity and
Infinity are attached to the margins to avoid problems when any element of x
is smaller (or greater) than each element of y.
We search for the positions of marked x elements:
In[53]:= xX = Flatten[Position[jj, HoldPattern[Complex[_, _]]]]
Out[53]=
{4, 7, 10, 11, 12, 14, 27, 31, 34, 35}
In[54]:= lowX = (c = 0;
FoldList[If[#2 - #1 === ++c, #1, c = 0; #2] &, -Infinity, xX - 1] //
Rest)
Out[54]=
{3, 6, 9, 9, 9, 13, 26, 30, 33, 33}
These are the corresponding positions of the next smaller element of y. You
see double occurances when x elements came out adjacent in the combined
list.
In[55]:= highX = (c = 0;
FoldList[If[#2 - #1 === --c, #1, c = 0; #2] &, Infinity,
Reverse[xX] + 1] // Rest // Reverse)
Out[55]=
{5, 8, 13, 13, 13, 15, 28, 32, 36, 36}
Same for the next greater elements.
In[56]:= low = zz[[lowX]];
In[57]:= high = zz[[highX]];
Now we take the next nearest of both candidates:
In[58]:=
MapThread[If[#3 - #2 > #2 - #1, #1, #3] &, {low, xx, high}]
Out[58]=
{0.0633918, 0.0635677, 0.151319, 0.201683, 0.201683, 0.201683, 0.586438, \
0.734706, 0.865283, 0.865283}
put together:
closestMatchP[x_List, y_List] :=
Module[{xx = Sort[x],
zz = Join[{-Infinity}, Sort[Join[y, x + I]], {Infinity}], xX, lowX,
highX, c},
xX = Flatten[Position[zz, HoldPattern[Complex[_, _]]]];
lowX = (c = 0;
FoldList[If[#2 - #1 === ++c, #1, c = 0; #2] &, -Infinity, xX - 1] //
Rest);
highX = (c = 0;
FoldList[If[#2 - #1 === --c, #1, c = 0; #2] &, Infinity,
Reverse[xX] + 1] // Rest // Reverse);
MapThread[If[#3 - #2 > #2 - #1, #1, #3] &, {zz[[lowX]], xx,
zz[[highX]]}]
]
In[73]:= closestMatchP[x, y]
Out[73]=
{0.0633918, 0.0635677, 0.151319, 0.201683, 0.201683, 0.201683, 0.586438, \
0.734706, 0.865283, 0.865283}
In[74]:= closestMatchP[x, y + .5]
Out[74]=
{0.534157, 0.534157, 0.534157, 0.534157, 0.534157, 0.534157, 0.563568, \
0.701683, 0.814219, 0.814219}
In[75]:= closestMatchP[x, y - .5]
Out[75]=
{0.086438, 0.086438, 0.116847, 0.234706, 0.234706, 0.234706, 0.49553, \
0.49553, 0.49553, 0.49553}
Performance test:
In[45]:= x2 = Table[Random[], {10000}];
In[46]:= y2 = Table[Random[], {30000}];
In[50]:= closestMatchP[x2, y2]; // Timing
Out[50]= {2.564 Second, Null}
Not too bad, possibly not best; but I haven't got the time to check the
alternatives sketched above.
--
Hartmut Wolf
Prev by Date:
**Re: Re: Showing thick lines - a problem?**
Next by Date:
**Re: On Packages and Buttons**
Previous by thread:
**Re: Re: Finding the closest number from a list**
Next by thread:
**Re: Finding the closest number from a list**
| |