RE: Finding the closest number from a list
- To: mathgroup at smc.vnet.net
- Subject: [mg39531] RE: [mg39510] Finding the closest number from a list
- From: "Wolf, Hartmut" <Hartmut.Wolf at t-systems.com>
- Date: Fri, 21 Feb 2003 04:11:01 -0500 (EST)
- Sender: owner-wri-mathgroup at wolfram.com
>-----Original Message----- >From: jrome at mail.com [mailto:jrome at mail.com] To: mathgroup at smc.vnet.net >Sent: Thursday, February 20, 2003 11:14 AM >To: mathgroup at smc.vnet.net >Subject: [mg39531] [mg39510] Finding the closest number from a list > [...] >I have a seemingly simple problem. I want to find the element in a >list that is closest to a number I specify. In my case, I have a list >of about 30,000 numbers (y[i]). I want to find the closest match for >each of about 10,000 other numbers (call them x[i]). Obviously, speed >is important. I've sorted the large list, and right now I'm going >through each y[i] from lowest to highest and testing it to see if x[i] >is less than that value. This takes about .1 seconds for each x[i]. > >I'm wondering if anyone has had a similar problem, and if there is a >better function built-in to Mathematica. Alternatetively, I could >build my own. I've just recently realized that I could also reduce the >search time considerably if I sort the x[i] list as well, and only >start my search from where I last left off. Any ideas on which >approach would be more efficient? Thanks. > Effectively this is a binning problem, delt with quite often in this list. (So search the archive.) What you didn't specify is the form of your output you desire. I assume it should be a list of closest elements of list y, corresponding in order to sorted list x. (You then easily may generate a list of replacement rules, or just eat up the result for further processing, by Scan, Map or the like.) The idea you communicated to simultaneously scan sorted y and sorted x lists is just the sort-merge approach. That should be among the best algorithms if compiled. Another idea would be to sort y and then make a binary search for each element of x to get at the two closest element of y, and the check. Here I'd like to communicate another idea relies on list processing, here step by step: In[1]:= y = Table[Random[], {30}] Out[1]= {0.267278, 0.99553, 0.201683, 0.0341569, 0.914983, 0.981118, 0.421376, \ 0.0633918, 0.426651, 0.433505, 0.963199, 0.616847, 0.865283, 0.926424, \ 0.151319, 0.734706, 0.278981, 0.247546, 0.586438, 0.0635677, 0.314219, \ 0.492426, 0.800959, 0.288232, 0.0469408, 0.496896, 0.599276, 0.254075, \ 0.131958, 0.515779} In[2]:= x = Table[Random[], {10}] Out[2]= {0.177901, 0.190683, 0.705306, 0.0822738, 0.214701, 0.573836, 0.840023, \ 0.15585, 0.0633828, 0.83913} The test data. (I'm sorry to include the outputs, but such you may exactly reproduce the steps.) In[51]:= xx = Sort[x]; (* sorted x, to be used later *) In[52]:= zz = Join[{-Infinity}, Sort[Join[y, x + I]], {Infinity}] Both lists y and x sorted together, the x are marked by complex I, such as to lokate there positions in this combined sorted list. -Infinity and Infinity are attached to the margins to avoid problems when any element of x is smaller (or greater) than each element of y. We search for the positions of marked x elements: In[53]:= xX = Flatten[Position[jj, HoldPattern[Complex[_, _]]]] Out[53]= {4, 7, 10, 11, 12, 14, 27, 31, 34, 35} In[54]:= lowX = (c = 0; FoldList[If[#2 - #1 === ++c, #1, c = 0; #2] &, -Infinity, xX - 1] // Rest) Out[54]= {3, 6, 9, 9, 9, 13, 26, 30, 33, 33} These are the corresponding positions of the next smaller element of y. You see double occurances when x elements came out adjacent in the combined list. In[55]:= highX = (c = 0; FoldList[If[#2 - #1 === --c, #1, c = 0; #2] &, Infinity, Reverse[xX] + 1] // Rest // Reverse) Out[55]= {5, 8, 13, 13, 13, 15, 28, 32, 36, 36} Same for the next greater elements. In[56]:= low = zz[[lowX]]; In[57]:= high = zz[[highX]]; Now we take the next nearest of both candidates: In[58]:= MapThread[If[#3 - #2 > #2 - #1, #1, #3] &, {low, xx, high}] Out[58]= {0.0633918, 0.0635677, 0.151319, 0.201683, 0.201683, 0.201683, 0.586438, \ 0.734706, 0.865283, 0.865283} put together: closestMatchP[x_List, y_List] := Module[{xx = Sort[x], zz = Join[{-Infinity}, Sort[Join[y, x + I]], {Infinity}], xX, lowX, highX, c}, xX = Flatten[Position[zz, HoldPattern[Complex[_, _]]]]; lowX = (c = 0; FoldList[If[#2 - #1 === ++c, #1, c = 0; #2] &, -Infinity, xX - 1] // Rest); highX = (c = 0; FoldList[If[#2 - #1 === --c, #1, c = 0; #2] &, Infinity, Reverse[xX] + 1] // Rest // Reverse); MapThread[If[#3 - #2 > #2 - #1, #1, #3] &, {zz[[lowX]], xx, zz[[highX]]}] ] In[73]:= closestMatchP[x, y] Out[73]= {0.0633918, 0.0635677, 0.151319, 0.201683, 0.201683, 0.201683, 0.586438, \ 0.734706, 0.865283, 0.865283} In[74]:= closestMatchP[x, y + .5] Out[74]= {0.534157, 0.534157, 0.534157, 0.534157, 0.534157, 0.534157, 0.563568, \ 0.701683, 0.814219, 0.814219} In[75]:= closestMatchP[x, y - .5] Out[75]= {0.086438, 0.086438, 0.116847, 0.234706, 0.234706, 0.234706, 0.49553, \ 0.49553, 0.49553, 0.49553} Performance test: In[45]:= x2 = Table[Random[], {10000}]; In[46]:= y2 = Table[Random[], {30000}]; In[50]:= closestMatchP[x2, y2]; // Timing Out[50]= {2.564 Second, Null} Not too bad, possibly not best; but I haven't got the time to check the alternatives sketched above. -- Hartmut Wolf