hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Birdsall <dave.birds...@esgyn.com>
Subject RE: table schema - row with many column vs many rows
Date Thu, 26 Jan 2017 22:12:57 GMT
My guess (and it is only a guess) is that you are traversing much less of the call stack when
you fetch one row of 20 columns than when you fetch 20 rows each with one column.

-----Original Message-----
From: Daniel Połaczański [mailto:dpolaczanski@gmail.com] 
Sent: Thursday, January 26, 2017 1:57 PM
To: user@hbase.apache.org
Subject: table schema - row with many column vs many rows

in the work we were testing the following scenarios regarding scan performance. We stored
2500 domain rows containing 20 attributes.And after that read one random row with all attributes
couple times

Scenario A
every single attribute stored in dedicated column. one hbase row with 20 columns.

Scenario B
every single attribute stored as a separate row under key like RowKey:AttributeKey so we have
20 rows for one domain row

As we know in HBase everything is stored as following entry RowKey:ColumnKey:Value

Theoritically we have in HBase the same amount of entries (2500*20) for both scenario, so
there shouldn't be any difference in performance. But it looks that scanning in scenario A
is much more faster (something like 10 times).

Do you havemaybe idea why Scenario A is better?

View raw message