hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Połaczański <dpolaczan...@gmail.com>
Subject Re: table schema - row with many column vs many rows
Date Fri, 27 Jan 2017 06:57:29 GMT
hi,
I don't use any enconding and compression.

version 1.2.3


2017-01-27 0:11 GMT+01:00 Ted Yu <yuzhihong@gmail.com>:

> Daniel:
> For the underlying column family, do you use any data block encoding /
> compression ?
>
> Which hbase release do you use ?
>
> Thanks
>
> On Thu, Jan 26, 2017 at 2:12 PM, Dave Birdsall <dave.birdsall@esgyn.com>
> wrote:
>
> > My guess (and it is only a guess) is that you are traversing much less of
> > the call stack when you fetch one row of 20 columns than when you fetch
> 20
> > rows each with one column.
> >
> > -----Original Message-----
> > From: Daniel Połaczański [mailto:dpolaczanski@gmail.com]
> > Sent: Thursday, January 26, 2017 1:57 PM
> > To: user@hbase.apache.org
> > Subject: table schema - row with many column vs many rows
> >
> > Hi,
> > in the work we were testing the following scenarios regarding scan
> > performance. We stored 2500 domain rows containing 20 attributes.And
> after
> > that read one random row with all attributes couple times
> >
> > Scenario A
> > every single attribute stored in dedicated column. one hbase row with 20
> > columns.
> >
> > Scenario B
> > every single attribute stored as a separate row under key like
> > RowKey:AttributeKey so we have 20 rows for one domain row
> >
> > As we know in HBase everything is stored as following entry
> > RowKey:ColumnKey:Value
> >
> > Theoritically we have in HBase the same amount of entries (2500*20) for
> > both scenario, so there shouldn't be any difference in performance. But
> it
> > looks that scanning in scenario A is much more faster (something like 10
> > times).
> >
> > Do you havemaybe idea why Scenario A is better?
> >
> > Regards
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message