hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rutherglen <jason.rutherg...@gmail.com>
Subject Re: prefix compression
Date Sat, 04 Jun 2011 04:01:36 GMT
Yeah it's truly super wild!  Here's the code: http://pastebin.com/bnB53UQz

You can see the line that's adding the string:

fstBuilder.add(new BytesRef(date), new Long(x));

On Fri, Jun 3, 2011 at 8:56 PM, Matt Corgan <mcorgan@hotpads.com> wrote:
> Jason - are you feeding it that whole string for each date?  Input data is
> 17 bytes per record * 50mm records = 850MB, and that reduces to 984 bytes?
>  Is it possible to compress by that much?  Maybe I'm missing something about
> how the FST works.
>
> Matt
>
>
> On Fri, Jun 3, 2011 at 8:51 PM, Jason Rutherglen <jason.rutherglen@gmail.com
>> wrote:
>
>> Also the next thing to measure with the FST is the key lookup speed.
>> I'm not sure what that'd look like, or how to compare with HBase right
>> now?
>>
>> On Fri, Jun 3, 2011 at 8:42 PM, Jason Rutherglen
>> <jason.rutherglen@gmail.com> wrote:
>> > Here's a nice preliminary number with the FST, 50 million dates of the
>> > form yyyyMMddHHmmssSSS, with each incremented by one millisecond.  The
>> > FST is 984 bytes, with an incrementing long to point to the presumably
>> > MMap'd value data.  This's a bit crazy.
>> >
>> > Perhaps we should try other increments as well?  Given that HBase keys
>> > especially are probably close increments of each other, I think the
>> > FST can always be loaded into RAM with pointers out to the actual
>> > values.
>> >
>>
>

Mime
View raw message