hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pankaj Gupta <pan...@brightroll.com>
Subject Re: Getting less write throughput due to more number of columns
Date Thu, 28 Mar 2013 14:26:53 GMT
Would prefix compression (https://issues.apache.org/jira/browse/HBASE-4676) improve this? 

This is an important question in terms of schema design. Given the choice of storing a value
in column vs rowkey, I would many times want to store a value in a rowkey if I foresee it
being used for constraining lookups, even if that it is only a weak use case at the time of
schema design. But, if there is substantial overhead in keeping values in row vs column then
I would want to keep only the absolutely essential identifier in row. The overhead of storing
values in rowkey influences the choice of what to store in rowkey.

On Mar 25, 2013, at 11:28 PM, Anoop Sam John <anoopsj@huawei.com> wrote:

> When the number of columns (qualifiers) are more yes it can impact the performance. In
HBase every where the storage will be in terms of KVs. The key will be some thing like rowkey+cfname+columnname+TS...
> So when u have 26 cells in a put then there will be repetition of many bytes in the key.(One
KV per column) So u will end up in transferring more data. Within memstore more data(actual
KV data size) getting written and so more frequent flushes.. etc..
> Have a look at Intel Panthera Document Store impl.
> -Anoop-
> ________________________________________
> From: Ankit Jain [ankitjaincs06@gmail.com]
> Sent: Monday, March 25, 2013 10:19 PM
> To: user@hbase.apache.org
> Subject: Getting less write throughput due to more number of columns
> Hi All,
> I am writing a records into HBase. I ran the performance test on following
> two cases:
> Set1: Input record contains 26 columns and record size is 2Kb.
> Set2: Input record contain 1 column and record size is 2Kb.
> In second case I am getting 8MBps more performance than step.
> are the large number of columns have any impact on write performance and If
> yes, how we can overcome it.
> --
> Thanks,
> Ankit Jain

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message