hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stuti Awasthi <stutiawas...@hcl.com>
Subject RE: Using multiple column families
Date Tue, 13 Sep 2011 04:35:19 GMT
Thanks St.Ack

-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: Monday, September 12, 2011 11:02 PM
To: user@hbase.apache.org
Subject: Re: Using multiple column families

It depends on how you access the table.  Three to four column families may be appropriate
schema if you are accessing individual cfs mostly.
Its when you do x-cf accesses, that things can slow (If most of your accesses are getting
all data -- then just have one cf).  Multiple cfs too if all active at the one time can make
the server internal accounting a little messy.  We've not spent much time studying and optimizing
for this case; e.g. mult-cf flushing, compacting, querying.
 Because of this, query times can be slower.

St.Ack

On Mon, Sep 12, 2011 at 12:05 AM, Stuti Awasthi <stutiawasthi@hcl.com> wrote:
> Hi,
>
> I am also looking answer for similar question. In my scenario we will be having petabytes
of data to handle. Currently I am working with schema which has 3-4 column family with them.
What the major issues we can face if we have multiple column family.
>
> I have read that each column family will be stored as separate Hfile in regionserver
and if we search by row-id and column family that will be useful as client will go to Hfile
for specific column family.
> If we have flat table structure then we will land up either having more tables with data
replication because of the data dependencies on each other.
>
> Please suggest
>
>
> -----Original Message-----
> From: Imran M Yousuf [mailto:imyousuf@gmail.com]
> Sent: Saturday, September 10, 2011 6:55 AM
> To: user@hbase.apache.org
> Subject: Re: Using multiple column families
>
> Hi J-D,
>
> Thanks for your feedback.
>
> (replies inline)
> On Sat, Sep 10, 2011 at 5:39 AM, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
>> 20k rows? If this is your only use case, you don't need HBase :)
>>
>
> Its one of several others
>
>> If it's 20k rows times a gazillion columns per row, then I would 
>> recommend flattening out the rows instead.
>>
>
> Well, our guess is at the moment their would not be more than 500 cells per family to
start with.
>
>> If it's just one small table among others, then you probably won't be 
>> bothered by the multiple families.
>>
>
> We actually have many other tables which are flattened out to a single column family
and this is one table for which we are using more than 1 column family.
>
> Thanks once again.
>
> Imran
>
>> J-D
>>
>> On Thu, Sep 8, 2011 at 10:07 PM, Imran M Yousuf <imyousuf@gmail.com> wrote:
>>> Hi,
>>>
>>> Firstly, I have read in the mailing list before that having more 
>>> than
>>> 1 column family is not recommended. I am more interested to know 
>>> whether it is a problem in my use case as well or not.
>>>
>>> I have a strong entitly and it has 6 weak entities all with 
>>> 1-to-many cardinal relationship to the strong entity. Furthermore, 
>>> they are all loaded in mutually exclusive manner, i.e. if A is 
>>> strong entity and its weak entities are P, Q, R, S, T, U in that 
>>> case no 2 weak entities are accessed at once. Moreover their 
>>> lifecycles are independent of each other. My current implementation 
>>> is I have one column family for the strong entity and one for each weak entities.
>>> So for a given row I only load one column family at a time. The 
>>> obvious advantages are that
>>> - deleting strong entity automatically deletes the weak entities as 
>>> they are a single row, delete all of a kind weak entity for a 
>>> specific weak entity is as simple as deleting all cells in a column 
>>> family for a row. Our assumption (pretty high than what we expect) 
>>> is that we will not have more than 20k rows in that table. Under 
>>> these circumstance how bad is it to have 7 column families?
>>>
>>> We would be glad if you would kindly share thoughts and feedback on this issue.
>>>
>>> Thank you,
>>>
>>> --
>>> Imran M Yousuf
>>> Entrepreneur & CEO
>>> Smart IT Engineering Ltd.
>>> Dhaka, Bangladesh
>>> Twitter: @imyousuf - http://twitter.com/imyousuf
>>> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
>>> Mobile: +880-1711402557
>>>
>>
>
>
>
> --
> Imran M Yousuf
> Entrepreneur & CEO
> Smart IT Engineering Ltd.
> Dhaka, Bangladesh
> Twitter: @imyousuf - http://twitter.com/imyousuf
> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
> Mobile: +880-1711402557
>
> ::DISCLAIMER::
> ----------------------------------------------------------------------
> -------------------------------------------------
>
> The contents of this e-mail and any attachment(s) are confidential and intended for the
named recipient(s) only.
> It shall not attach any liability on the originator or HCL or its 
> affiliates. Any views or opinions presented in this email are solely those of the author
and may not necessarily reflect the opinions of HCL or its affiliates.
> Any form of reproduction, dissemination, copying, disclosure, 
> modification, distribution and / or publication of this message 
> without the prior written consent of the author of this e-mail is 
> strictly prohibited. If you have received this email in error please delete it and notify
the sender immediately. Before opening any mail and attachments please check them for viruses
and defect.
>
> ----------------------------------------------------------------------
> -------------------------------------------------
>

Mime
View raw message