hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gautam <gautamkows...@gmail.com>
Subject Re: Hbase row ingestion ..
Date Thu, 30 Apr 2015 01:03:48 GMT
Thanks for the quick response!

Our read path is fairly straightforward and very deterministic. We always
push down predicates at the rowkey level and read the row's full payload (
never do projection/filtering over CQs ).  So.. I could, in theory, expect
a gain as much as the current overhead of  [ 40 * sizeof(rowkey) ] ?
Curious to understand more about how much of that overhead is actually
incurred over the network and how much on the RS side. At least to the
extent it affects the put() / flush()  calls. Lemme know if there are
particular parts of the code or documentation I should be looking at for
this. Would like to learn about the memory/netwokr footprint of write calls.

thank you,
-Gautam.


On Wed, Apr 29, 2015 at 5:48 PM, Esteban Gutierrez <esteban@cloudera.com>
wrote:

> Hi Gautam,
>
> Your reasoning is correct and that will improve the write performance,
> specially if you always need to write all the qualifiers in a row (sort of
> a rigid schema). However you should consider to use qualifiers at some
> extent if the read pattern might include some conditional search, e.g. if
> you are interested to filter rows that have a qualifier on it.
>
> cheers,
> esteban.
>
>
> --
> Cloudera, Inc.
>
>
> On Wed, Apr 29, 2015 at 5:31 PM, Gautam <gautamkowshik@gmail.com> wrote:
>
> > .. I'd like to add that we have a very fat rowkey.
> >
> > - Thanks.
> >
> > On Wed, Apr 29, 2015 at 5:30 PM, Gautam <gautamkowshik@gmail.com> wrote:
> >
> > > Hello,
> > >        We'v been fighting some ingestion perf issues on hbase and I
> have
> > > been looking at the write path in particular. Trying to optimize on
> write
> > > path currently.
> > >
> > > We have around 40 column qualifiers (under single CF) for each row. So
> I
> > > understand that each put(row) written into hbase would translate into
> 40
> > > (rowkey, cq, ts)  cells in Hbase.  If I switched to an Avro object
> based
> > > schema instead there would be a single (rowkey, avro_cq, ts) cell per
> > row (
> > > all fields shoved into a single Avro blob).  Question is, would this
> > > approach really translate into any write-path perf benefits?
> > >
> > > Cheers,
> > > -Gautam.
> > >
> > >
> > >
> >
> >
> >
> > --
> > "If you really want something in this life, you have to work for it. Now,
> > quiet! They're about to announce the lottery numbers..."
> >
>



-- 
"If you really want something in this life, you have to work for it. Now,
quiet! They're about to announce the lottery numbers..."

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message