Re: finding the k-nearest neighbour
- To: mathgroup at smc.vnet.net
- Subject: [mg26167] Re: [mg26139] finding the k-nearest neighbour
- From: "Carl K. Woll" <carlw at u.washington.edu>
- Date: Thu, 30 Nov 2000 01:04:10 -0500 (EST)
- Sender: owner-wri-mathgroup at wolfram.com
Hi Johannes, A slight refinement of your idea is to replace Abs /@ (li-x) with Abs[li-x] so that your function definition becomes knearest[x_, li_List, k_Integer] := Sort[Abs[li - x]][[k]] The function Abs has the attribute Listable, so it will automatically be applied to each element in the list. As you mentioned in your post, your function is slow because it needs to sort the whole list. However, this is unnecessary, since you can extract the part of your list around the point x which might be a k-nearest neighbor. In order to do this extraction, you will need to come up with a function which will tell you where in the list your point x belongs. One slick way of doing this is to use the function Interpolation. For example, suppose your list is li = Sort[Table[Random[],{10000}] Then, you can create an index function as follows: index[x_] = Round[Interpolation[Transpose[{li, N[Range[0, Length[li] - 1]]}], InterpolationOrder -> 0][x]]; Note the use of = instead of := above. Since Interpolation automatically converts the function values to fixed precision numbers, I used N to convert the integers to fixed precision numbers before Interpolation does its thing. If I neglect to use N above, then the initial computation of the index function will take about 5 times longer. Of course, in either case, it takes a fraction of a second to actually compute the index of a number using the function index. Now, we can speed up your knearest function by extracting the relevant portion of the list before applying the Sort and Abs functions: kfaster[x_, li_List, k_Integer] := Module[ {b = Max[index[x] - k + 1, 1], e = Min[index[x] + k, Length[li]]}, Sort[Abs[Take[li, {b, e}] - x]][[k]]] As an example, li = Table[Random[], {100000}] // Sort; index[x_] = Round[Interpolation[Transpose[{li, N[Range[0, Length[li] - 1]]}], InterpolationOrder -> 0][x]]; Do[ans1 = knearest[.3, li, 100], {10}] // Timing Do[ans2 = kfaster[.3, li, 100], {10}] // Timing ans1 == ans2 {6.859 Second, Null} {0.016 Second, Null} True So, for this test case, kfaster is nearly 400 times faster. Of course, as k gets larger, the speed difference will decrease, until ultimately knearest will be marginally faster when k is close to the length of the list. Carl Woll Physics Dept U of Washington ----- Original Message ----- From: "Ludsteck" <Ludsteck at zew.de> To: mathgroup at smc.vnet.net Subject: [mg26167] [mg26139] finding the k-nearest neighbour > > Dear MathGroup members, > I have to find the distance between a number x and its k - nearest > neighbour in a list of numbers (where k is an integer and distance is > simply Abs). > This is quite easy in Mathematica. I use > > knearest[x_, li_List, k_Integer]:= Sort[ Abs/@(li - x) ][[k]] > > However, it is also very inefficient, since the whole list of absolute > deviations > has to be sorted. Now my lists are very long and I have to apply the > knearest[...] function > several thousands of times. Therefore I have to search for much faster > solutions. > I think the fasted way would be to use a priority queue of length k. > > Has someone other solutions or can someone provide ready-to-use code for > priority queues in Mathematica? > > Thank you very much, > Johannes > > > > Johannes Ludsteck > Forschungsbereich Arbeitsmaerkte, > Personalmanagement und Soziale Sicherung > Zentrum fuer europaeische Wirtschaftsforschung > Mannheim (ZEW) > Postfach 103443 > D 68934 Mannheim > > E-mail: ludsteck at zew.de >