MathGroup Archive 2012

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: How best to implement a hash table in Mathematica

In article <jhqmfb$f7i$1 at>,
 Szabolcs Horvát <szhorvat at> wrote:

> On 2/18/2012 1:28 PM, Joseph Gwinn wrote:
> > I have an application that resembles dictionary lookup, with significant
> > rates of collision.
> >
> > The current approach is to initialize the hash table, a 10^4 element
> > list of empty sublists, and then build the table by inserting document
> > object id numbers into the correct sublists as indexed by some important
> > property of the object.  The choice of property varies with application.
> > The property may be quantized geographic location, or some attribute of
> > the object itself.
> >
> > When done building the hash table, query consists of reading the sublist
> > using an index derived in the same way as for the original building of
> > the list.  Each sublist may have many entries, and the entries need not
> > be numbers.
> >
> > Anyway, I have a workable approach using
> >
> > HashList[[index]]=
> > Union[Flatten[Join[HashList[[index]],{new object ID}]]]]
> >
> > as the core update function, but I'm wondering how well this will scale
> > to say 10^6 sublists (which may be needed to keep the collisions under
> > control with large datasets), and of course if there is a better way in
> > Mathematica.
> >
> > I did look at SparseArrays, but they don't support elements that are
> > sublists.
> >
> > Although this hash table is one dimensional, I have uses for 2D and
> > maybe 3D tables, which are very common when one is hashing physical
> > location.
> >
> If you have an application similar to dictionary lookup, have you 
> considered using the DownValues of a symbol?
> dict[_] = Null
> dict["key1"] = "value1"
> dict["key2"] = "value2"
> This is a mutable data structure which might not play very well with 
> Mathematica's native paradigms, but it is often very useful.
> You can also consider using a list of rules:
> rules = {"key1" -> "value1", "key2" -> "value2"}

Each key value would need to support a perhaps empty perhaps large list 
of values.

> You can build a dispatch table using Dispatch[rules] to make 
> replacements very fast, but every time you add an element, it will be 
> necessary to re-build the Dispatch object from the rules, which might 
> take a while.

I did consider rules and Dispatch, but they seemed mismatched and thus 
overly complex (and slow) compared to the direct approach with lists.

Please see my response to David Bailey for the details of my approach.

Joe Gwinn

  • Prev by Date: Re: Controlling # Processors Used by MathKernel?
  • Next by Date: Re: Equation solving problem
  • Previous by thread: Re: How best to implement a hash table in Mathematica
  • Next by thread: Re: How best to implement a hash table in Mathematica