hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Perluss <tradersan...@gmail.com>
Subject Re: Pack rows into a wide row for better performance?
Date Wed, 28 Aug 2013 06:33:10 GMT
Sorry, accidentally hit send. I'm guessing a 10 minute time slice would
drop their space savings from 4-8x down to 2-4x.
On Aug 27, 2013 11:30 PM, "Chris Perluss" <tradersancho@gmail.com> wrote:

> I'm still kinda new to HBase so please excuse me if I am wrong.  I suspect
> the reason has to do with a different slide from their presentation where
> they run a job every hour to combine all the cells from the previous hour
> into one cell.
>
> OpenTSDB has quite a long row key. It contains the metric name, the
> timestamp, and numerous optional tags. If you wrote one metric every second
> then you would write 3600 columns per row key. Since the row key is very
> long, it uses quite a bit of space to store the same row key 3600 times.
> By combining an hours worth of data into one cell OpenTMS claims they save
> 4-8x of their storage.
>
> If they stayed with their original 10 minute time slice then they would
> have to store their giant row key 6 times per hour instead of once. I'm
> going to guess this
> On Aug 27, 2013 10:50 PM, "林煒清" <thesuperching@gmail.com> wrote:
>
>> *Context*:
>>
>> Recently, I see openTSDB having their rows packed by period, thus end in
>> ten to hundred columns per row. It claim that this design performs more
>> efficient for row seeking.(on slide:Lessons learned from openTSDB)
>>
>> *My argument*:
>>
>>  If *a block of HFile *is indexed by the start key of itself, which the
>> key
>> is made of {row, cf, cq} , then I think read time for the specific Key
>> should be the same for all tall-or-wide table case, since the physical
>> storage is sorted by key, not only by rowkey.
>>
>>  So that under one column family the rowkey+column is a key as a whole,
>> shift a part of the rowkey to the column is the same as shift a part of
>> rowkey to the tail of the rowkey, vice versa.
>>
>> Follow this logic , under physical view the openTSDB did is just change
>> key
>> index by shifting a portion of timestamp bytes to position behind rowkey,
>> that is column qualifier.
>>
>> *Question*:
>>
>> 1.When getting (get is a special scan, right?) a packed row worth of one
>> hour, or scan over one hour range of rows, I don't see there could any
>> performance improvement. So why openTSDB says packed row have better
>> performance for row seeking?
>>
>> 2.Almost every doc & books all recommend tall table design and especially
>> at book "HBase in Action", it says that ,among others, the consideration
>> of
>> reading performance is the reason why tall is adopting, though I still
>> can't get it why?
>>
>> 3.Also that the KeyValues inside a block is searched by *linear* scan, and
>> start key of blocks is by binary search , right?
>>
>> any hint is much appreciated.
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message