hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Brown <tombrow...@gmail.com>
Subject Re: MemStore and prefix encoding
Date Mon, 27 Aug 2012 16:20:16 GMT

I have been relying on the expected behavior (if I write another cell
with the same {key, family, qualifier, version} it won't return the
previous one) so you're answer was confusing to me. I did more
research and I found that the HBase guide specifies that behavior (see
section 5.8.1 of http://hbase.apache.org/book.html).

Have I misunderstood something? Can I rely on behavior that is
specified in the guide?

Thanks again!


On Sun, Aug 26, 2012 at 6:43 AM, Eric Czech <eric@nextbigsound.com> wrote:
> Thanks for the info lars!
> In the potential use case I have for writing at the same timestamp,
> the values would always be the same anyways so I should be good.
> On Sat, Aug 25, 2012 at 9:12 PM, lars hofhansl <lhofhansl@yahoo.com> wrote:
>> I checked the code to be sure...
>> In ScanWildcardColumnTracker we have this:
>>       if (sameAsPreviousTSAndType(timestamp, type)) {
>>         return ScanQueryMatcher.MatchCode.SKIP;
>>       }
>> And in ExplicitColumnTracker there is this:
>>         if (sameAsPreviousTS(timestamp)) {
>>           //If duplicate, skip this Key
>>           return ScanQueryMatcher.MatchCode.SKIP;
>>         }
>> I.e. the first KV is kept and the subsequent ones (with the same TS) are skipped.
>> My point remains, though: Do not rely on this.
>> (Though it will probably stay the way it is, because that is the most efficient way
to handle this in forward only scanners.)
>> -- Lars
>> ________________________________
>>  From: Tom Brown <tombrown52@gmail.com>
>> To: "user@hbase.apache.org" <user@hbase.apache.org>; lars hofhansl <lhofhansl@yahoo.com>
>> Sent: Saturday, August 25, 2012 4:54 PM
>> Subject: Re: MemStore and prefix encoding
>> I thought when multiple values with the same key, family, qualifier and timestamps
were written, the one that was written latest (as determined by position in the store) would
be read. Is that not the case?
>> --Tom
>> On Saturday, August 25, 2012, lars hofhansl <lhofhansl@yahoo.com> wrote:
>>> The prefix encoding applies to blocks in the HFiles and in the block cache, but
not to the memstore.
>>> #1 Yes. Each column family is its own store. All stores are flushed together,
so have many add overhead (especially if a few tend to hold a lot of data, but the others
don't, leading to very many small store files that need to be compacted).
>>> #2 There is only one key with the same key, column family, qualifier, and timestamp
(if you write multiple with the same timestamp it is undefined which one you'll get back when
you read the next time). So that does not make sense. Writes with the same key, column family,
qualifier (each with a different timestamp) count towards the version limit.
>>> -- Lars
>>> ----- Original Message -----
>>> From: Eric Czech <eric@nextbigsound.com>
>>> To: user <user@hbase.apache.org>
>>> Cc:
>>> Sent: Saturday, August 25, 2012 2:44 PM
>>> Subject: MemStore and prefix encoding
>>> Hi everyone,
>>> Does prefix encoding apply to rows in MemStores or does it only apply
>>> to rows on disk in HFiles?  I'm trying to decide if I should still
>>> favor larger values in order to not repeat keys, column families, and
>>> qualifiers more than necessary and while prefix encoding seems to
>>> negate that concern for storage on disk, I'm not sure if it's still
>>> applicable to in-memory storage.
>>> Also, I had two other quick (unrelated) questions and I assume it'd be
>>> less annoying if I put them all in one email:
>>> 1.  Do column families defined for a table introduce any overhead for
>>> rows that don't put any values in them?  I don't think that's the case
>>> but I wanted to be sure.
>>> 2.  Do writes with the same key, column family, qualifier, and
>>> timestamp count towards the version limit?
>>> Thanks for the help!

View raw message