Re: How best to implement a hash table in Mathematica
- To: mathgroup at smc.vnet.net
- Subject: [mg125035] Re: How best to implement a hash table in Mathematica
- From: Szabolcs Horvát <szhorvat at gmail.com>
- Date: Sun, 19 Feb 2012 06:28:42 -0500 (EST)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- References: <jho21j$5sb$1@smc.vnet.net>
On 2/18/2012 1:28 PM, Joseph Gwinn wrote:
> I have an application that resembles dictionary lookup, with significant
> rates of collision.
>
> The current approach is to initialize the hash table, a 10^4 element
> list of empty sublists, and then build the table by inserting document
> object id numbers into the correct sublists as indexed by some important
> property of the object. The choice of property varies with application.
> The property may be quantized geographic location, or some attribute of
> the object itself.
>
> When done building the hash table, query consists of reading the sublist
> using an index derived in the same way as for the original building of
> the list. Each sublist may have many entries, and the entries need not
> be numbers.
>
> Anyway, I have a workable approach using
>
> HashList[[index]]=
> Union[Flatten[Join[HashList[[index]],{new object ID}]]]]
>
> as the core update function, but I'm wondering how well this will scale
> to say 10^6 sublists (which may be needed to keep the collisions under
> control with large datasets), and of course if there is a better way in
> Mathematica.
>
> I did look at SparseArrays, but they don't support elements that are
> sublists.
>
> Although this hash table is one dimensional, I have uses for 2D and
> maybe 3D tables, which are very common when one is hashing physical
> location.
>
If you have an application similar to dictionary lookup, have you
considered using the DownValues of a symbol?
dict[_] = Null
dict["key1"] = "value1"
dict["key2"] = "value2"
This is a mutable data structure which might not play very well with
Mathematica's native paradigms, but it is often very useful.
You can also consider using a list of rules:
rules = {"key1" -> "value1", "key2" -> "value2"}
You can build a dispatch table using Dispatch[rules] to make
replacements very fast, but every time you add an element, it will be
necessary to re-build the Dispatch object from the rules, which might
take a while.
--
Szabolcs Horvát
Visit Mathematica.SE: http://mathematica.stackexchange.com/