hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "M. Karthikeyan" <m.karthike...@ericsson.com>
Subject RE: more tables or more rows
Date Wed, 08 Aug 2012 12:14:14 GMT
A slightly related question:
We have time series data continuously flowing into the system and has to be stored in HBase.
We have retention policy to retain data for 90 days, so data older than 90 days have to be
deleted from HBase every midnight.
There are two (that we know) ways of doing this:
1) Since bulk deletes could be costly and dropping an entire table is easier, we could have
day wise tables and drop entire table 
2) This post http://permalink.gmane.org/gmane.comp.java.hadoop.hbase.user/9603 suggests that
we can have a single table and use the TTL feature for ageing out data.

May I request someone to briefly list out the pros and cons of either options?
PS: We expect around 200 million records per day and each record would be approx.. 500 bytes.
Thanks & Regards

-----Original Message-----
From: Mohammad Tariq [mailto:dontariq@gmail.com] 
Sent: 08 August 2012 03:19
To: user@hbase.apache.org
Subject: Re: more tables or more rows

Hello sir,

    It is absolutely fine to have as many tables as we like. My point was that if we have
a large no of tables then it might add some overhead in locating the user region, as there
will be a huge amount of mapping from "user tables" to "region servers". Also, client will
have to cache  more information blocking the additional memory. So, I suggested to have small
no of large tables rather than large no of small tables, if the data is similar.

    Mohammad Tariq

On Tue, Aug 7, 2012 at 5:30 PM, Eric Czech <eric@nextbigsound.com> wrote:
> Thanks Mohammad,
> By saying the major purpose is to host very large tables (implying a 
> smaller number of them), are you referring to anything other than the 
> memstores per column family taking up sizable portions of physical memory?
>  Are there other components or design aspects that make using large 
> numbers of tables inadvisable?
> On Sun, Aug 5, 2012 at 5:55 PM, Mohammad Tariq <dontariq@gmail.com> wrote:
>> Hello sir,
>>       Going for a single table with 30+ rows would be a better 
>> choice, if the data from all the sources is not very different. 
>> Since, you are considering Hbase as your data store, it wouldn't be 
>> wise to have several small rows. The major purpose of Hbase is to 
>> host very large tables that may go beyond billions of rows and millions of columns.
>> Regards,
>>     Mohammad Tariq
>> On Mon, Aug 6, 2012 at 3:18 AM, Eric Czech <eric@nextbigsound.com> wrote:
>>> I need to support data that comes from 30+ sources and the structure 
>>> of that data is consistent across all the sources, but what I'm not 
>>> clear on is whether or not I should use 30+ tables with roughly the 
>>> same format or 1 table where the row key reflects the source.
>>> Anybody have a strong argument one way or the other?
>>> Thanks!

View raw message