hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kamal Bahadur <mailtoka...@gmail.com>
Subject Re: Schema Design Newbie Question
Date Mon, 23 Dec 2013 23:47:11 GMT
Hi Dhaval,

Thanks for the quick response!

Why do you think having more files is not a good idea? Is it because of OS

I get around 50 million records a day and each record contains  ~25
columns. Values for each column are ~30 characters.


On Mon, Dec 23, 2013 at 3:35 PM, Dhaval Shah <prince_mithibai@yahoo.co.in>wrote:

> A 1000 CFs with HBase does not sound like a good idea.
> category + timestamp sounds like the better of the 2 options you have
> thought of.
> Can you tell us a little more about your data?
> Regards,
> Dhaval
> ________________________________
>  From: Kamal Bahadur <mailtokamal@gmail.com>
> To: user@hbase.apache.org
> Sent: Monday, 23 December 2013 6:01 PM
> Subject: Schema Design Newbie Question
> Hello,
> I am just starting to use HBase and I am coming from Cassandra world.Here
> is a quick background regarding my data:
> My system will be storing data that belongs to a certain category.
> Currently I have around 1000 categories.  Also note that some categories
> produce lot more data than others. To be precise, 10% of the categories
> provide more than 65% of the total data in the system.
> Data access queries always contains this category in the query. I have
> listed 2 options to design the schema:
> 1. Add category as first component of the row key [category + timestamp] so
> that my data is sorted based on category for fast retrieval.
> 2. Add category as column family so that I can just use timestamp as
> rowkey. This option will however create more hfiles since I have more
> categories.
> I am leaning towards option2. I like the idea that HBase separates data for
> each CF into its own HFiles. However I still worried about the number of
> hfiles that will be created on the server. Will it cause any other side
> effects? I would like to hear from the user community as to which option
> will be the best option in my case.
> Kamal

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message