hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kamal Bahadur <mailtoka...@gmail.com>
Subject Re: Schema Design Newbie Question
Date Mon, 23 Dec 2013 23:47:11 GMT
Hi Dhaval,

Thanks for the quick response!

Why do you think having more files is not a good idea? Is it because of OS
restrictions?

I get around 50 million records a day and each record contains  ~25
columns. Values for each column are ~30 characters.

Kamal


On Mon, Dec 23, 2013 at 3:35 PM, Dhaval Shah <prince_mithibai@yahoo.co.in>wrote:

> A 1000 CFs with HBase does not sound like a good idea.
>
> category + timestamp sounds like the better of the 2 options you have
> thought of.
>
> Can you tell us a little more about your data?
>
> Regards,
>
> Dhaval
>
>
> ________________________________
>  From: Kamal Bahadur <mailtokamal@gmail.com>
> To: user@hbase.apache.org
> Sent: Monday, 23 December 2013 6:01 PM
> Subject: Schema Design Newbie Question
>
>
> Hello,
>
> I am just starting to use HBase and I am coming from Cassandra world.Here
> is a quick background regarding my data:
>
> My system will be storing data that belongs to a certain category.
> Currently I have around 1000 categories.  Also note that some categories
> produce lot more data than others. To be precise, 10% of the categories
> provide more than 65% of the total data in the system.
>
> Data access queries always contains this category in the query. I have
> listed 2 options to design the schema:
>
> 1. Add category as first component of the row key [category + timestamp] so
> that my data is sorted based on category for fast retrieval.
> 2. Add category as column family so that I can just use timestamp as
> rowkey. This option will however create more hfiles since I have more
> categories.
>
> I am leaning towards option2. I like the idea that HBase separates data for
> each CF into its own HFiles. However I still worried about the number of
> hfiles that will be created on the server. Will it cause any other side
> effects? I would like to hear from the user community as to which option
> will be the best option in my case.
>
> Kamal
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message