hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Migdal <ja...@fb.com>
Subject Re: prefix compression implementation
Date Thu, 22 Sep 2011 00:23:46 GMT


On 9/20/11 6:04 PM, "Matt Corgan" <mcorgan@hotpads.com> wrote:

>jacek >> It is a huge chance. It would be great if we could prototype a
>few
>things.
>Especially I would like to avoid any optimizations before we know a got
>way to measure them.
>
>matt >> agree.  i'm not in a rush to get any of this integrated, just
>trying
>to feel out the right long-term strategy.  do you have unit tests that
>you're running on a substantial amount of data to compare different
>implementations?

I got some tests on production data which test compression ratio. The
performance test are synthetic and haven't measure real world performance.
Right know I;m working on it.

Jacek

>
>On Tue, Sep 20, 2011 at 4:58 PM, Jacek Migdal <jacek@fb.com> wrote:
>
>>
>>
>> On 9/20/11 10:59 AM, "Matt Corgan" <mcorgan@hotpads.com> wrote:
>>
>> >bringing all questions into a single email:
>> >
>> >stack >> I'd say call it Cell rather than HCell.
>> >
>> >i did think the H was a very simple way to add uniqueness, like isn't
>> >"HFile" a big win over "File"?  there are already two other classes
>>called
>> >"Cell" in hbase (guava and REST gateway).  another option could be KV,
>> >though i don't like making exceptions to java's no-abbreviations
>> >guidelines.
>> KeyValueCell?
>>
>> To be honest, no name seems to be a very good option. However, it would
>>be
>> nice if it would be somewhat related to KeyValue.
>>
>> On large scope, it would be hard to integrate this interface anytime
>>soon.
>> I would rather do it later.
>>
>> >stack >> There is a patch lying around that adds a version to KV by
>>using
>> >top
>> >two bytes of the type byte.  If you need me to dig it up, just say
>> >(then you might not have to have v1 stuff in your Interface).
>> >
>> >not sure what you mean here.  top two bits?  you mean encoding the
>> >timestamp
>> >inside the type byte?
>> Versioning KeyValue per KeyValue seems to be crazy. Shouldn't it be per
>> block or file.
>>
>>
>> >(interface discussion)
>> >
>> It is a huge chance. It would be great if we could prototype a few
>>things.
>> Especially I would like to avoid any optimizations before we know a got
>> way to measure them.
>>
>> Jacek
>>
>>


Mime
View raw message