hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 乃岩 <sohomod...@126.com>
Subject Re: Re: About HBASE-3149
Date Sun, 22 Dec 2013 03:07:54 GMT
Hello, Thank you for your reply. 
If use only 1 or 2 CFs, why Hbase say it's a column data store? It's actually row-based data

I understand, there's some trade-off if use a lot of CFs. However, I'd like to say, we should
have that option! We can have SSDs in the backend to support this IO overhead, however it's
not the excuse to disable it. 

Just want to make sure if HBase can support creating 100000 CFs, leaving problems in flush
and pressure on disk I/O there. 

In addition to this, if 3149 will be merge into 0.94, 0.96 or future releases? 



From: Andrew Purtell
Date: 2013-12-22 04:26
To: dev@hbase.apache.org; sohomodern
Subject: Re: About HBASE-3149
Bear in mind that how many files you'll have open simultaneously is a function of number of
regions, number of column families, and how compaction organizes the HBase files on disk (the
strategy in effect and its parameters, the current ingest rate, and so on). You call ballpark
this as such: If you have one column family in a table, and store data into all the regions,
then you will have one file open on the cluster per region, or more. If you have 100,000 column
families in a table, and store data into all the regions and CFs, then you will have 100,000
files open on the cluster per region, *or more*. You will run into OS and HDFS levels attempting
this, I don't recommend it. 

I don't think any reasonable schema design needs produce a requirement for 100,000 column
*families*. You can have any number of keys with <column>:<qualifier> in a column
family, varying the <qualifier> to 100,000 or 1,000,000 or more unique values is no
problem. Can you say more about what you are trying to accomplish?

On Sat, Dec 21, 2013 at 7:17 AM, 乃岩 <sohomodern@126.com> wrote:

   Can anybody tell me if future HBase release will integrate 3149 for Make flush decisions
per column family?

  By the way, for current HBase, if the simultaneous flush is the only issue? I mean, to create
100000 CFs will not be a problem, right?

  Thanks in advance!



Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message