MathGroup Archive 2012

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: How best to implement a hash table in Mathematica

  • To: mathgroup at smc.vnet.net
  • Subject: [mg125035] Re: How best to implement a hash table in Mathematica
  • From: Szabolcs Horvát <szhorvat at gmail.com>
  • Date: Sun, 19 Feb 2012 06:28:42 -0500 (EST)
  • Delivered-to: l-mathgroup@mail-archive0.wolfram.com
  • References: <jho21j$5sb$1@smc.vnet.net>

On 2/18/2012 1:28 PM, Joseph Gwinn wrote:
> I have an application that resembles dictionary lookup, with significant
> rates of collision.
>
> The current approach is to initialize the hash table, a 10^4 element
> list of empty sublists, and then build the table by inserting document
> object id numbers into the correct sublists as indexed by some important
> property of the object.  The choice of property varies with application.
> The property may be quantized geographic location, or some attribute of
> the object itself.
>
> When done building the hash table, query consists of reading the sublist
> using an index derived in the same way as for the original building of
> the list.  Each sublist may have many entries, and the entries need not
> be numbers.
>
> Anyway, I have a workable approach using
>
> HashList[[index]]=
> Union[Flatten[Join[HashList[[index]],{new object ID}]]]]
>
> as the core update function, but I'm wondering how well this will scale
> to say 10^6 sublists (which may be needed to keep the collisions under
> control with large datasets), and of course if there is a better way in
> Mathematica.
>
> I did look at SparseArrays, but they don't support elements that are
> sublists.
>
> Although this hash table is one dimensional, I have uses for 2D and
> maybe 3D tables, which are very common when one is hashing physical
> location.
>

If you have an application similar to dictionary lookup, have you 
considered using the DownValues of a symbol?

dict[_] = Null
dict["key1"] = "value1"
dict["key2"] = "value2"

This is a mutable data structure which might not play very well with 
Mathematica's native paradigms, but it is often very useful.

You can also consider using a list of rules:

rules = {"key1" -> "value1", "key2" -> "value2"}

You can build a dispatch table using Dispatch[rules] to make 
replacements very fast, but every time you add an element, it will be 
necessary to re-build the Dispatch object from the rules, which might 
take a while.

-- 
Szabolcs Horvát
Visit Mathematica.SE:  http://mathematica.stackexchange.com/



  • Prev by Date: Re: How best to implement a hash table in Mathematica
  • Next by Date: Iterative solution to a transcendental equation.
  • Previous by thread: Re: How best to implement a hash table in Mathematica
  • Next by thread: Re: How best to implement a hash table in Mathematica