hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From innowireless TaeYun Kim <taeyun....@innowireless.co.kr>
Subject RE: Question on the number of column families
Date Tue, 05 Aug 2014 11:36:20 GMT
Thank you for your reply.

I can decrease the size of column value if it's not good for HBase.
BTW, The values are for a point on a grid cell on a map.
250000 is 500x500, and 500x500 is somewhat related to the size of the client screen that displays
the values on a map.
Normally a client requests the values for the area that is displayed on the screen.


-----Original Message-----
From: Alok Kumar [mailto:alokawi@gmail.com] 
Sent: Tuesday, August 05, 2014 8:24 PM
To: user@hbase.apache.org
Subject: Re: Question on the number of column families

Hi,

Hbase creates HFile per column-family. Having 130 column-family is really not recommended.
It will increase number of file pointer ( open file count) underneath.

If you are sure which columns are "frequently" accessed by users, you could consider putting
them in one column family. And "Non frequently" ones in another.
Btw, ~5MB size of column value is something to consider. We should wait for some expert advise
here!!


Thanks
Alok


On Tue, Aug 5, 2014 at 4:50 PM, innowireless TaeYun Kim < taeyun.kim@innowireless.co.kr>
wrote:

> Plus,
> the size of the value of each field can be ~5MB, since max 250000 
> lines of the source data will be merged into one record, to match the 
> request pattern.
>
>
> -----Original Message-----
> From: innowireless TaeYun Kim [mailto:taeyun.kim@innowireless.co.kr]
> Sent: Tuesday, August 05, 2014 8:11 PM
> To: user@hbase.apache.org
> Subject: Question on the number of column families
>
> Hi,
>
>
>
> According to http://hbase.apache.org/book/number.of.cfs.html, having 
> more than 2~3 column families are strongly discouraged.
>
>
>
> BTW, in my case, records on a table have the following characteristics:
>
>
>
> - The table is read-only. It is bulk-loaded once. When a new data is 
> ready, A new table is created and the old table is deleted.
>
> - The size of the source data can be hundreds of gigabytes.
>
> - A record has about 130 fields.
>
> - The number of fields in a record is fixed.
>
> - The names of the fields are also fixed. (it's like a table in RDBMS)
>
> - About 40(it varies) fields mostly have value, while other fields are 
> mostly empty(null in RDBMS).
>
> - It is unknown which field will be dense. It depends on the source data.
>
> - Fields are accessed independently. Normally a user requests just one 
> field. A user can request several fields.
>
> - The range on the range query is the same for all fields. (No wider, 
> no narrower, regardless the data density)
>
> For me, it seems that it would be more efficient if there is one 
> column family for each field, since it would cost less disk I/O, for 
> only the needed column data will be read.
>
>
>
> Can the table have 130 column families for this case?
>
> Or the whole columns must be in one column family?
>
>
>
> Thanks.
>
>
>
>
>


--
Alok Kumar
Email : alokawi@gmail.com
http://sharepointorange.blogspot.in/
http://www.linkedin.com/in/alokawi


Mime
View raw message