hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 林煒清 <thesuperch...@gmail.com>
Subject Re: Pack rows into a wide row for better performance?
Date Thu, 29 Aug 2013 02:41:46 GMT
Oh , so they have them packed into one cell .  If so, now its reasonable
that they claim it speed up row seeking .
thanks a lot.


2013/8/28 Chris Perluss <tradersancho@gmail.com>

> Sorry, accidentally hit send. I'm guessing a 10 minute time slice would
> drop their space savings from 4-8x down to 2-4x.
> On Aug 27, 2013 11:30 PM, "Chris Perluss" <tradersancho@gmail.com> wrote:
>
> > I'm still kinda new to HBase so please excuse me if I am wrong.  I
> suspect
> > the reason has to do with a different slide from their presentation where
> > they run a job every hour to combine all the cells from the previous hour
> > into one cell.
> >
> > OpenTSDB has quite a long row key. It contains the metric name, the
> > timestamp, and numerous optional tags. If you wrote one metric every
> second
> > then you would write 3600 columns per row key. Since the row key is very
> > long, it uses quite a bit of space to store the same row key 3600 times.
> > By combining an hours worth of data into one cell OpenTMS claims they
> save
> > 4-8x of their storage.
> >
> > If they stayed with their original 10 minute time slice then they would
> > have to store their giant row key 6 times per hour instead of once. I'm
> > going to guess this
> > On Aug 27, 2013 10:50 PM, "林煒清" <thesuperching@gmail.com> wrote:
> >
> >> *Context*:
> >>
> >> Recently, I see openTSDB having their rows packed by period, thus end in
> >> ten to hundred columns per row. It claim that this design performs more
> >> efficient for row seeking.(on slide:Lessons learned from openTSDB)
> >>
> >> *My argument*:
> >>
> >>  If *a block of HFile *is indexed by the start key of itself, which the
> >> key
> >> is made of {row, cf, cq} , then I think read time for the specific Key
> >> should be the same for all tall-or-wide table case, since the physical
> >> storage is sorted by key, not only by rowkey.
> >>
> >>  So that under one column family the rowkey+column is a key as a whole,
> >> shift a part of the rowkey to the column is the same as shift a part of
> >> rowkey to the tail of the rowkey, vice versa.
> >>
> >> Follow this logic , under physical view the openTSDB did is just change
> >> key
> >> index by shifting a portion of timestamp bytes to position behind
> rowkey,
> >> that is column qualifier.
> >>
> >> *Question*:
> >>
> >> 1.When getting (get is a special scan, right?) a packed row worth of one
> >> hour, or scan over one hour range of rows, I don't see there could any
> >> performance improvement. So why openTSDB says packed row have better
> >> performance for row seeking?
> >>
> >> 2.Almost every doc & books all recommend tall table design and
> especially
> >> at book "HBase in Action", it says that ,among others, the consideration
> >> of
> >> reading performance is the reason why tall is adopting, though I still
> >> can't get it why?
> >>
> >> 3.Also that the KeyValues inside a block is searched by *linear* scan,
> and
> >> start key of blocks is by binary search , right?
> >>
> >> any hint is much appreciated.
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message