MathGroup Archive: February 2012 [00410]

[Date Index] [Thread Index] [Author Index]

Re: How best to implement a hash table in Mathematica

To: mathgroup at smc.vnet.net
Subject: [mg125102] Re: How best to implement a hash table in Mathematica
From: A Retey <awnl at gmx-topmail.de>
Date: Tue, 21 Feb 2012 06:16:12 -0500 (EST)
Delivered-to: l-mathgroup@mail-archive0.wolfram.com
References: <jho21j$5sb$1@smc.vnet.net> <jhqmrr$fbq$1@smc.vnet.net> <jhstsv$nk0$1@smc.vnet.net>

Hi,

> The table has to be updated more or less at random as the dataset is
> digested.  What I'm doing is essentially your above method, but not all
> elements will ever receive a value.  In fact, most will not.
>
> The reason to initialize the hashtable in advance is so a later probe to
> an empty table element yields {}, versus repeating the question back.

Honestly, despite your additional explanations I don't feel I fully 
understand what you are trying to do, but the following assumes that it 
is essential that you use your own hash function instead of what 
Mathematica would internally use.

You should note that David's suggestion to use DownValues to store the 
key/value pairs will allow for returning empty lists for the keys that 
have not been met yet, even without preallocating entries or a priori 
knowledge of the hash index modulus. You would simply "initialize" with 
a blank pattern (see second line):

ClearAll[hashtable];
hashtable[_] = {};
Module[{index},
  Scan[(
      index = hashingfunction[#];
      hashtable[index] = Union[Join[hashtable[index], {#}]]
      ) &,
    dataset
  ];
]

It is difficult to make a statement about runtime performance compared 
to your approach, since it will depend on the hash function you are 
using. But I see some advantages beside runtime performance: the 
approach will work without knowledge of the hash index modulus and it 
will be more memory efficient if the modulus is very large and thus 
there are many empty values (in general I think that a large modulus is 
what you usually are after when looking for a good hash function).

You might want to try the above with your hash function, with

hashingfunction = Hash;
dataset = DictionaryLookup["*t*"];

the above will take 0.343 Seconds for about 41000 strings on my machine, 
which is presumably much slower than your machine. Using StringLength as 
hashfunction slows things down considerable, most probably because of 
the need to reallocate memory for the growing value arrays. I think that 
this is also dominating the runtime of your approach, and because 
everything is in one large array even more so. For my example with 
hashfunction=StringLength building the hash table using DownValues seems 
to be a lot faster...

hth,

albert

Prev by Date: Re: Extensive replacement of trigonometric functions

Next by Date: Re: Microarray data analysis

Previous by thread: Re: How best to implement a hash table in Mathematica

Next by thread: Re: How best to implement a hash table in Mathematica