Re: How best to implement a hash table in Mathematica
- To: mathgroup at smc.vnet.net
- Subject: [mg125102] Re: How best to implement a hash table in Mathematica
- From: A Retey <awnl at gmx-topmail.de>
- Date: Tue, 21 Feb 2012 06:16:12 -0500 (EST)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- References: <jho21j$5sb$1@smc.vnet.net> <jhqmrr$fbq$1@smc.vnet.net> <jhstsv$nk0$1@smc.vnet.net>
Hi, > The table has to be updated more or less at random as the dataset is > digested. What I'm doing is essentially your above method, but not all > elements will ever receive a value. In fact, most will not. > > The reason to initialize the hashtable in advance is so a later probe to > an empty table element yields {}, versus repeating the question back. Honestly, despite your additional explanations I don't feel I fully understand what you are trying to do, but the following assumes that it is essential that you use your own hash function instead of what Mathematica would internally use. You should note that David's suggestion to use DownValues to store the key/value pairs will allow for returning empty lists for the keys that have not been met yet, even without preallocating entries or a priori knowledge of the hash index modulus. You would simply "initialize" with a blank pattern (see second line): ClearAll[hashtable]; hashtable[_] = {}; Module[{index}, Scan[( index = hashingfunction[#]; hashtable[index] = Union[Join[hashtable[index], {#}]] ) &, dataset ]; ] It is difficult to make a statement about runtime performance compared to your approach, since it will depend on the hash function you are using. But I see some advantages beside runtime performance: the approach will work without knowledge of the hash index modulus and it will be more memory efficient if the modulus is very large and thus there are many empty values (in general I think that a large modulus is what you usually are after when looking for a good hash function). You might want to try the above with your hash function, with hashingfunction = Hash; dataset = DictionaryLookup["*t*"]; the above will take 0.343 Seconds for about 41000 strings on my machine, which is presumably much slower than your machine. Using StringLength as hashfunction slows things down considerable, most probably because of the need to reallocate memory for the growing value arrays. I think that this is also dominating the runtime of your approach, and because everything is in one large array even more so. For my example with hashfunction=StringLength building the hash table using DownValues seems to be a lot faster... hth, albert