Re: How best to implement a hash table in Mathematica
- To: mathgroup at smc.vnet.net
- Subject: [mg125035] Re: How best to implement a hash table in Mathematica
- From: Szabolcs Horvát <szhorvat at gmail.com>
- Date: Sun, 19 Feb 2012 06:28:42 -0500 (EST)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- References: <jho21j$5sb$1@smc.vnet.net>
On 2/18/2012 1:28 PM, Joseph Gwinn wrote: > I have an application that resembles dictionary lookup, with significant > rates of collision. > > The current approach is to initialize the hash table, a 10^4 element > list of empty sublists, and then build the table by inserting document > object id numbers into the correct sublists as indexed by some important > property of the object. The choice of property varies with application. > The property may be quantized geographic location, or some attribute of > the object itself. > > When done building the hash table, query consists of reading the sublist > using an index derived in the same way as for the original building of > the list. Each sublist may have many entries, and the entries need not > be numbers. > > Anyway, I have a workable approach using > > HashList[[index]]= > Union[Flatten[Join[HashList[[index]],{new object ID}]]]] > > as the core update function, but I'm wondering how well this will scale > to say 10^6 sublists (which may be needed to keep the collisions under > control with large datasets), and of course if there is a better way in > Mathematica. > > I did look at SparseArrays, but they don't support elements that are > sublists. > > Although this hash table is one dimensional, I have uses for 2D and > maybe 3D tables, which are very common when one is hashing physical > location. > If you have an application similar to dictionary lookup, have you considered using the DownValues of a symbol? dict[_] = Null dict["key1"] = "value1" dict["key2"] = "value2" This is a mutable data structure which might not play very well with Mathematica's native paradigms, but it is often very useful. You can also consider using a list of rules: rules = {"key1" -> "value1", "key2" -> "value2"} You can build a dispatch table using Dispatch[rules] to make replacements very fast, but every time you add an element, it will be necessary to re-build the Dispatch object from the rules, which might take a while. -- Szabolcs Horvát Visit Mathematica.SE: http://mathematica.stackexchange.com/