hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: About HBASE-3149
Date Sat, 21 Dec 2013 20:26:14 GMT
Bear in mind that how many files you'll have open simultaneously is a
function of number of regions, number of column families, and how
compaction organizes the HBase files on disk (the strategy in effect and
its parameters, the current ingest rate, and so on). You call ballpark this
as such: If you have one column family in a table, and store data into all
the regions, then you will have one file open on the cluster per region, or
more. If you have 100,000 column families in a table, and store data into
all the regions and CFs, then you will have 100,000 files open on the
cluster per region, *or more*. You will run into OS and HDFS levels
attempting this, I don't recommend it.

I don't think any reasonable schema design needs produce a requirement for
100,000 column *families*. You can have any number of keys with
<column>:<qualifier> in a column family, varying the <qualifier> to 100,000
or 1,000,000 or more unique values is no problem. Can you say more about
what you are trying to accomplish?

On Sat, Dec 21, 2013 at 7:17 AM, 乃岩 <sohomodern@126.com> wrote:

> Hi,
>    Can anybody tell me if future HBase release will integrate 3149 for
> Make flush decisions per column family?
>   By the way, for current HBase, if the simultaneous flush is the only
> issue? I mean, to create 100000 CFs will not be a problem, right?
>   Thanks in advance!
> N.Y.

Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message