hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pete Haidinyak" <javam...@cox.net>
Subject Re: OT - Hash Code Creation
Date Thu, 17 Mar 2011 07:30:48 GMT
Thanks, I'll give that a try.

-Pete

On Thu, 17 Mar 2011 00:23:00 -0700, Ted Dunning <tdunning@maprtech.com>  
wrote:

> Double hashing is a find thing.  To actually answer the question,  
> though, I
> would recommend Murmurhash or JOAAT (
> http://en.wikipedia.org/wiki/Jenkins_hash_function)
>
> On Wed, Mar 16, 2011 at 3:48 PM, Andrey Stepachev <octo47@gmail.com>  
> wrote:
>
>> Try hash table with double hashing.
>> Something like this
>>
>> http://www.java2s.com/Code/Java/Collections-Data-Structure/Hashtablewithdoublehashing.htm
>>
>> 2011/3/17 Peter Haidinyak <phaidinyak@local.com>
>>
>> > Hi,
>> >        This is a little off topic but this group seems pretty swift  
>> so I
>> > thought I would ask. I am aggregating a day's worth of log data which
>> means
>> > I have a Map of over 24 million elements. What would be a good  
>> algorithm
>> to
>> > use for generating Hash Codes for these elements that cut down on
>> > collisions? I application starts out reading in a log (144 logs in  
>> all)
>> in
>> > about 20 seconds and by the time I reach the last log it is taking  
>> around
>> > 120 seconds. The extra 100 seconds have to do with Hash Table  
>> Collisions.
>> > I've played around with different Hashing algorithms and cut the  
>> original
>> > time from over 300 seconds to 120 but I know I can do better.
>> > The key I am using for the Map is an alpha-numeric string that is
>> > approximately 16 character long with the last 4 or 5 character being  
>> the
>> > most unique.
>> >
>> > Any ideas?
>> >
>> > Thanks
>> >
>> > -Pete
>> >


Mime
View raw message