hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anil gupta <anilgupt...@gmail.com>
Subject Re: Disable timestamp in HBase Table a.k.a Disable Versioning in HBase Table
Date Wed, 30 May 2012 19:57:30 GMT
@Anoop: We recently finished out first phase of POC. It went quite well.
Now, we are trying to see which all features we are going to use for final
implementation. We are still in research mode trying out different options.
We are also trying out LZO and Snappy compression algos. Yes, in my POC V1
also in my custom mapper for bulkloader i was passing same value of curtime
in millis for a single row. I can easily change the loader to take 0L as
timestamp for all data.

@Matt: We are using cloudera distribution at present. So, i will need to
ask cloudera folks about the hbase version used in cdh4(at present it's
0.92). I looked into hbase site and current stable version is 0.92. So, i
think it seems really tough that 0.96 will be a stable release in next 3-4
months. Anyways, any idea when HBase 0.96 is supposed to be released?stable?

> HBase-6093 seems to be very close to my suggestion. The only difference is
> that Matt mentioned in the description that it can only be used when all
> inserts are type=Put. Is aforementioned restriction due to HFileV2? I
think
> deleting an entire row wouldn't be a problem. right?

Any inputs on the above question?

On Tue, May 29, 2012 at 9:26 PM, Anoop Sam John <anoopsj@huawei.com> wrote:

> Hi Anil,
>         As HBASE-4676 is not available as of now, may be you can check
> other enoders, DiffKeyDeltaEncoder or FastDiffDeltaEncoder.
> Pls go through the javadoc of these and see what they do apart from
> compressing the timestamp parts. These do other nice stiff too which will
> make your data stored on disk to be smaller size.
>
> When HBASE-4676 comes you can try using that as it would be more close to
> your need I think.
>
> Also pls make sure to set timestamp as 0L in all your Puts. If you don't
> do that then HBase will set the curtime in millis as the timestamp for each
> Put.
>
> -Anoop-
> ________________________________________
> From: Matt Corgan [mcorgan@hotpads.com]
> Sent: Wednesday, May 30, 2012 5:16 AM
> To: user@hbase.apache.org
> Subject: Re: Disable timestamp in HBase Table a.k.a Disable Versioning in
> HBase Table
>
> >
> > Is this feature going to be part of any future release of HBase?
>
> i couldn't get it finished in time for 0.94, but i think it's very likely
> to be in 0.96, possibly with a backport to .94.  Scan speed should improve
> if i have time to optimize the cell comparators and collators
>
>
> On Tue, May 29, 2012 at 4:29 PM, anil gupta <anilgupta84@gmail.com> wrote:
>
> > Hi All,
> >
> > Sorry for late reply as i got stuck in other task at work on Friday and
> > skimming through the HBase-4676 took me a while.
> >
> > HBase-6093 seems to be very close to my suggestion. The only difference
> is
> > that Matt mentioned in the description that it can only be used when all
> > inserts are type=Put. Is aforementioned restriction due to HFileV2? I
> think
> > deleting an entire row wouldn't be a problem. right? I have very little
> > knowledge about HFileV2. I will try to read about HFileV2 soon.
> >
> > HBASE-4676 seems really cool. IMHO, currently the issue is that write and
> > scan(slower by ~2x as compared to NONE if we assume that Trie compresses
> by
> > ~2-3x) are slow and as per the jira if ratio of value/Key is big then
> trie
> > wont have any impact. Is this feature going to be part of any future
> > release of HBase?  Awesome stuff Matt.
> >
> > @Anoop: You meant that i should use the feature in HBase-4676 and pass
> the
> > timestamp as 0L in each put. Right?
> >
> > Thanks all for your valuable time and inputs.
> > -Anil
> >
> >
> > On Thu, May 24, 2012 at 11:22 PM, Matt Corgan <mcorgan@hotpads.com>
> wrote:
> >
> > > Hi Anil,
> > >
> > > I created HBASE-6093
> > > <https://issues.apache.org/jira/browse/HBASE-6093>with an idea that
> > > could solve this problem.  It could be a simple
> > > implementation for simple workloads, but gets harder to support for
> > tables
> > > with TTL's, maxVersion > 1, Deletes, etc...  Maybe it can only be
> enabled
> > > if the other ColumnFamily settings are compatible.
> > >
> > > Matt
> > >
> > >
> > > On Thu, May 24, 2012 at 9:37 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> > >
> > > > What Anoop said is in 0.94.0
> > > >
> > > > For trunk, HBASE-4676 provides trie data block encoding.
> > > > It suits write-once read-many use case very well.
> > > >
> > > > Cheers
> > > >
> > > > On Thu, May 24, 2012 at 5:57 PM, Anoop Sam John <anoopsj@huawei.com>
> > > > wrote:
> > > >
> > > > > Hi Anil,
> > > > >           There is no way you can avoid the timestamp with KVs. In
> > your
> > > > > case you can think of using data block encoding? You can see
> > > > > FastDiffDeltaEncoder and DiffKeyDeltaEncoder. This includes way of
> > > > avoiding
> > > > > writing the 8 bytes into each KV for timestamp. Still some bytes
> will
> > > be
> > > > > written though and this will be done at the block level. Also pls
> > note
> > > > that
> > > > > these encoders will do much more things than the timestamp space
> > > > > optimization. Also you need to make sure to pass some timestamp in
> > your
> > > > > Puts. May be better make as 0L. Else in RS side HBase will assign
> the
> > > cur
> > > > > time as the timestamp.  Hope when u read the javadoc for these
> > encoder
> > > > > classes, u will be more clear.
> > > > >
> > > > > The one you are telling abt having a feature to fully avoid the
> > > timestamp
> > > > > is a topic to discuss
> > > > >
> > > > > Hope I make it clear to you
> > > > >
> > > > > -Anoop-
> > > > > ________________________________________
> > > > > From: anil gupta [anilgupta84@gmail.com]
> > > > > Sent: Friday, May 25, 2012 3:21 AM
> > > > > To: user@hbase.apache.org
> > > > > Subject: Disable timestamp in HBase Table a.k.a Disable Versioning
> in
> > > > > HBase Table
> > > > >
> > > > > Hi All,
> > > > >
> > > > > We are planning to store data in HBase. Currently, in one of our
> use
> > > case
> > > > > once a row is written into HBase Table we wont be modifying the
> data
> > of
> > > > > that row. Since, for every cell(right?) in HBase a timestamp(long
> > > value)
> > > > is
> > > > > stored; this would take up extra 8 bytes. I was thinking is there
a
> > way
> > > > to
> > > > > disable timestamp on HBase table when versioning is not required.
I
> > > went
> > > > > through the documentation and searched mailing list for same but
> > could
> > > > not
> > > > > find anything relevant. Since we are talking about billions of
> cells,
> > > > this
> > > > > would add up to significant amount of space.(around 7.45 GigaBytes
> > for
> > > 1
> > > > > billion cells). Does this sounds like a feature HBase is missing?
> > > > >
> > > > > Please share your thoughts.
> > > > >
> > > > > --
> > > > > Thanks & Regards,
> > > > > Anil Gupta
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>



-- 
Thanks & Regards,
Anil Gupta

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message