hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dhaval Shah <prince_mithi...@yahoo.co.in>
Subject Re: Schema Design Newbie Question
Date Mon, 23 Dec 2013 23:35:35 GMT
A 1000 CFs with HBase does not sound like a good idea. 

category + timestamp sounds like the better of the 2 options you have thought of. 

Can you tell us a little more about your data? 


 From: Kamal Bahadur <mailtokamal@gmail.com>
To: user@hbase.apache.org 
Sent: Monday, 23 December 2013 6:01 PM
Subject: Schema Design Newbie Question


I am just starting to use HBase and I am coming from Cassandra world.Here
is a quick background regarding my data:

My system will be storing data that belongs to a certain category.
Currently I have around 1000 categories.  Also note that some categories
produce lot more data than others. To be precise, 10% of the categories
provide more than 65% of the total data in the system.

Data access queries always contains this category in the query. I have
listed 2 options to design the schema:

1. Add category as first component of the row key [category + timestamp] so
that my data is sorted based on category for fast retrieval.
2. Add category as column family so that I can just use timestamp as
rowkey. This option will however create more hfiles since I have more

I am leaning towards option2. I like the idea that HBase separates data for
each CF into its own HFiles. However I still worried about the number of
hfiles that will be created on the server. Will it cause any other side
effects? I would like to hear from the user community as to which option
will be the best option in my case.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message