       RE: Finding the closest number from a list

• To: mathgroup at smc.vnet.net
• Subject: [mg39531] RE: [mg39510] Finding the closest number from a list
• From: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>
• Date: Fri, 21 Feb 2003 04:11:01 -0500 (EST)
• Sender: owner-wri-mathgroup at wolfram.com

```>-----Original Message-----
>From: jrome at mail.com [mailto:jrome at mail.com]
To: mathgroup at smc.vnet.net
>Sent: Thursday, February 20, 2003 11:14 AM
>To: mathgroup at smc.vnet.net
>Subject: [mg39531] [mg39510] Finding the closest number from a list
>
[...]

>I have a seemingly simple problem. I want to find the element in a
>list that is closest to a number I specify. In my case, I have a list
>of about 30,000 numbers (y[i]). I want to find the closest match for
>each of about 10,000 other numbers (call them x[i]). Obviously, speed
>is important. I've sorted the large list, and right now I'm going
>through each y[i] from lowest to highest and testing it to see if x[i]
>is less than that value. This takes about .1 seconds for each x[i].
>
>I'm wondering if anyone has had a similar problem, and if there is a
>better function built-in to Mathematica. Alternatetively, I could
>build my own. I've just recently realized that I could also reduce the
>search time considerably if I sort the x[i] list as well, and only
>start my search from where I last left off.  Any ideas on which
>approach would be more efficient? Thanks.
>

Effectively this is a binning problem, delt with quite often in this list.
(So search the archive.)

What you didn't specify is the form of your output you desire. I assume it
should be a list of closest elements of list y, corresponding in order to
sorted list x. (You then easily may generate a list of replacement rules, or
just eat up the result for further processing, by Scan, Map or the like.)

The idea you communicated to simultaneously scan sorted y and sorted x lists
is just the sort-merge approach. That should be among the best algorithms if
compiled.

Another idea would be to sort y and then make a binary search for each
element of x to get at the two closest element of y, and the check.

Here I'd like to communicate another idea relies on list processing, here
step by step:

In:= y = Table[Random[], {30}]
Out=
{0.267278, 0.99553, 0.201683, 0.0341569, 0.914983, 0.981118, 0.421376, \
0.0633918, 0.426651, 0.433505, 0.963199, 0.616847, 0.865283, 0.926424, \
0.151319, 0.734706, 0.278981, 0.247546, 0.586438, 0.0635677, 0.314219, \
0.492426, 0.800959, 0.288232, 0.0469408, 0.496896, 0.599276, 0.254075, \
0.131958, 0.515779}

In:= x = Table[Random[], {10}]
Out=
{0.177901, 0.190683, 0.705306, 0.0822738, 0.214701, 0.573836, 0.840023, \
0.15585, 0.0633828, 0.83913}

The test data. (I'm sorry to include the outputs, but such you may exactly
reproduce the steps.)

In:= xx = Sort[x];   (* sorted x, to be used later *)

In:=
zz = Join[{-Infinity}, Sort[Join[y, x + I]], {Infinity}]

Both lists y and x sorted together, the x are marked by complex I, such as
to lokate there positions in this combined sorted list. -Infinity and
Infinity are attached to the margins to avoid problems when any element of x
is smaller (or greater) than each element of y.

We search for the positions of marked x elements:

In:= xX = Flatten[Position[jj, HoldPattern[Complex[_, _]]]]
Out=
{4, 7, 10, 11, 12, 14, 27, 31, 34, 35}

In:= lowX = (c = 0;
FoldList[If[#2 - #1 === ++c, #1, c = 0; #2] &, -Infinity, xX - 1] //
Rest)
Out=
{3, 6, 9, 9, 9, 13, 26, 30, 33, 33}

These are the corresponding positions of the next smaller element of y. You
see double occurances when x elements came out adjacent in the combined
list.

In:= highX = (c = 0;
FoldList[If[#2 - #1 === --c, #1, c = 0; #2] &, Infinity,
Reverse[xX] + 1] // Rest // Reverse)
Out=
{5, 8, 13, 13, 13, 15, 28, 32, 36, 36}

Same for the next greater elements.

In:= low = zz[[lowX]];
In:= high = zz[[highX]];

Now we take the next nearest of both candidates:

In:=
MapThread[If[#3 - #2 > #2 - #1, #1, #3] &, {low, xx, high}]
Out=
{0.0633918, 0.0635677, 0.151319, 0.201683, 0.201683, 0.201683, 0.586438, \
0.734706, 0.865283, 0.865283}

put together:

closestMatchP[x_List, y_List] :=
Module[{xx = Sort[x],
zz = Join[{-Infinity}, Sort[Join[y, x + I]], {Infinity}], xX, lowX,
highX, c},
xX = Flatten[Position[zz, HoldPattern[Complex[_, _]]]];
lowX = (c = 0;
FoldList[If[#2 - #1 === ++c, #1, c = 0; #2] &, -Infinity, xX - 1] //

Rest);
highX = (c = 0;
FoldList[If[#2 - #1 === --c, #1, c = 0; #2] &, Infinity,
Reverse[xX] + 1] // Rest // Reverse);
MapThread[If[#3 - #2 > #2 - #1, #1, #3] &, {zz[[lowX]], xx,
zz[[highX]]}]
]

In:= closestMatchP[x, y]
Out=
{0.0633918, 0.0635677, 0.151319, 0.201683, 0.201683, 0.201683, 0.586438, \
0.734706, 0.865283, 0.865283}

In:= closestMatchP[x, y + .5]
Out=
{0.534157, 0.534157, 0.534157, 0.534157, 0.534157, 0.534157, 0.563568, \
0.701683, 0.814219, 0.814219}

In:= closestMatchP[x, y - .5]
Out=
{0.086438, 0.086438, 0.116847, 0.234706, 0.234706, 0.234706, 0.49553, \
0.49553, 0.49553, 0.49553}

Performance test:

In:= x2 = Table[Random[], {10000}];
In:= y2 = Table[Random[], {30000}];
In:= closestMatchP[x2, y2]; // Timing
Out= {2.564 Second, Null}

Not too bad, possibly not best; but I haven't got the time to check the
alternatives sketched above.

--
Hartmut Wolf

```

• Prev by Date: Re: Re: Showing thick lines - a problem?
• Next by Date: Re: On Packages and Buttons
• Previous by thread: Re: Re: Finding the closest number from a list
• Next by thread: Re: Finding the closest number from a list