MathGroup Archive 2000

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: finding the k-nearest neighbour

  • To: mathgroup at smc.vnet.net
  • Subject: [mg26167] Re: [mg26139] finding the k-nearest neighbour
  • From: "Carl K. Woll" <carlw at u.washington.edu>
  • Date: Thu, 30 Nov 2000 01:04:10 -0500 (EST)
  • Sender: owner-wri-mathgroup at wolfram.com

Hi Johannes,

A slight refinement of your idea is to replace

Abs /@ (li-x)

with

Abs[li-x]

so that your function definition becomes

knearest[x_, li_List, k_Integer] := Sort[Abs[li - x]][[k]]

The function Abs has the attribute Listable, so it will automatically be
applied to each element in the list.

As you mentioned in your post, your function is slow because it needs to
sort the whole list. However, this is unnecessary, since you can extract the
part of your list around the point x which might be a k-nearest neighbor. In
order to do this extraction, you will need to come up with a function which
will tell you where in the list your point x belongs. One slick way of doing
this is to use the function Interpolation. For example, suppose your list is

li = Sort[Table[Random[],{10000}]

Then, you can create an index function as follows:

index[x_] =
    Round[Interpolation[Transpose[{li, N[Range[0, Length[li] - 1]]}],
          InterpolationOrder -> 0][x]];

Note the use of = instead of := above. Since Interpolation automatically
converts the function values to fixed precision numbers, I used N to convert
the integers to fixed precision numbers before Interpolation does its thing.
If I neglect to use N above, then the initial computation of the index
function will take about 5 times longer. Of course, in either case, it takes
a fraction of a second to actually compute the index of a number using the
function index. Now, we can speed up your knearest function by extracting
the relevant portion of the list before applying the Sort and Abs functions:

kfaster[x_, li_List, k_Integer] :=
  Module[
    {b = Max[index[x] - k + 1, 1],
      e = Min[index[x] + k, Length[li]]},
    Sort[Abs[Take[li, {b, e}] - x]][[k]]]

As an example,

li = Table[Random[], {100000}] // Sort;
index[x_] =
    Round[Interpolation[Transpose[{li, N[Range[0, Length[li] - 1]]}],
          InterpolationOrder -> 0][x]];

Do[ans1 = knearest[.3, li, 100], {10}] // Timing
Do[ans2 = kfaster[.3, li, 100], {10}] // Timing
ans1 == ans2

{6.859 Second, Null}

{0.016 Second, Null}

True

So, for this test case, kfaster is nearly 400 times faster. Of course, as k
gets larger, the speed difference will decrease, until ultimately knearest
will be marginally faster when k is close to the length of the list.

Carl Woll
Physics Dept
U of Washington

----- Original Message -----
From: "Ludsteck" <Ludsteck at zew.de>
To: mathgroup at smc.vnet.net
Subject: [mg26167] [mg26139] finding the k-nearest neighbour


>
> Dear MathGroup members,
> I have to find the distance between a number x and its k - nearest
> neighbour in a list of numbers (where k is an integer and distance is
> simply Abs).
> This is quite easy in Mathematica. I use
>
> knearest[x_, li_List, k_Integer]:= Sort[ Abs/@(li - x) ][[k]]
>
> However, it is also very inefficient, since the whole list of absolute
> deviations
> has to be sorted. Now my lists are very long and I have to apply the
> knearest[...] function
> several thousands of times. Therefore I have to search for much faster
> solutions.
> I think the fasted way would be to use a priority queue of length k.
>
> Has someone other solutions or can someone provide ready-to-use code for
> priority queues in Mathematica?
>
> Thank you very much,
> Johannes
>
>
>
> Johannes Ludsteck
> Forschungsbereich Arbeitsmaerkte,
> Personalmanagement und Soziale Sicherung
> Zentrum fuer europaeische Wirtschaftsforschung
> Mannheim (ZEW)
> Postfach 103443
> D 68934 Mannheim
>
> E-mail: ludsteck at zew.de
>



  • Prev by Date: Re: Function compile problems
  • Next by Date: Re: How to find Complex roots !!
  • Previous by thread: Re: finding the k-nearest neighbour
  • Next by thread: Equivalent functions generate different plots