hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Delip Rao" <delip...@gmail.com>
Subject Re: Indexed Hashtables
Date Fri, 16 Jan 2009 12:39:57 GMT
Thanks everyone for the suggestions! I tried all options so far except
Voldemort (Steve) and here's my evaluation:

memcached (Sean) -- works very fast. Good option if used along with an
existing slow index.
MapFile (Peter) -- excellent option that is a part of Hadoop but works
very slow for large number of key/value pairs. This was the problem
with HashStore too.

We initially started with Hbase but found it very hard to setup and
when we did, it wasn't kind to our modest academic cluster with
limited memory. But Hbase is a great option otherwise. Our
requirements were very simple -- we have a few million key/value pairs
(both strings) that need to be looked up frequently. The solution I
ended up was a simple trie based hash for the keys storing the index
of the corresponding values which are kept on the disk.


On Thu, Jan 15, 2009 at 4:14 PM, Jim Twensky <jim.twensky@gmail.com> wrote:
> Delip,
> Why do you think Hbase will be an overkill? I do something similar to what
> you're trying to do with Hbase and I haven't encountered any significant
> problems so far. Can you give some more info on the size of the data you
> have?
> Jim
> On Wed, Jan 14, 2009 at 8:47 PM, Delip Rao <deliprao@gmail.com> wrote:
>> Hi,
>> I need to lookup a large number of key/value pairs in my map(). Is
>> there any indexed hashtable available as a part of Hadoop I/O API?
>> I find Hbase an overkill for my application; something on the lines of
>> HashStore (www.cellspark.com/hashstore.html) should be fine.
>> Thanks,
>> Delip

View raw message