hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Mogenet <adrien.moge...@gmail.com>
Subject Re: schema design: rows vs wide columns
Date Sun, 28 Apr 2013 15:23:22 GMT
Wide area :-)

I agree with Michael, perhaps the best explanation could be to explicit
*WHEN* adding extra CF perfectly makes sense.


On Tue, Apr 16, 2013 at 4:35 PM, Michael Segel <michael_segel@hotmail.com>wrote:

> I think the important thing about Column Families is trying to understand
> on how to use them properly in a design.
>
> Sparse data may make sense. It depends on the use case and an
> understanding of the trade offs.
>
> It all depends on how the data breaks down in to specific use cases.
>
> Keeping CFs to a minimum makes sense. However, what that minimum remains
> to be seen.
>
> It depends....
>
>
> On Apr 16, 2013, at 9:08 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > bq. Maybe we can explain why there is some impacts, or what to consider?
> >
> > The above would be covered in the JIRA.
> >
> > Thanks
> >
> > On Tue, Apr 16, 2013 at 7:04 AM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> >> Can we add more details than just changing the maximum CF number? Maybe
> we
> >> can explain why there is some impacts, or what to consider?
> >>
> >> JM
> >>
> >> 2013/4/16 Ted Yu <yuzhihong@gmail.com>
> >>
> >>> If there is no objection, I will create a JIRA to increase the maximum
> >>> number of column families described here:
> >>>
> >>> http://hbase.apache.org/book.html#number.of.cfs
> >>>
> >>> Cheers
> >>>
> >>> On Mon, Apr 8, 2013 at 7:21 AM, Doug Meil <
> doug.meil@explorysmedical.com
> >>>> wrote:
> >>>
> >>>>
> >>>>
> >>>> For the record, the refGuide mentions potential issues of CF lumpiness
> >>>> that you mentioned:
> >>>>
> >>>> http://hbase.apache.org/book.html#number.of.cfs
> >>>>
> >>>>
> >>>> 6.2.1. Cardinality of ColumnFamilies
> >>>>
> >>>> Where multiple ColumnFamilies exist in a single table, be aware of the
> >>>> cardinality (i.e., number of rows).
> >>>>      If ColumnFamilyA has 1 million rows and ColumnFamilyB has 1
> >> billion
> >>>> rows, ColumnFamilyA's data will likely be spread
> >>>>      across many, many regions (and RegionServers).  This makes mass
> >>>> scans for ColumnFamilyA less efficient.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Š. anything that needs to be updated/added for this?
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On 4/8/13 12:39 AM, "lars hofhansl" <larsh@apache.org> wrote:
> >>>>
> >>>>> I think the main problem is that all CFs have to be flushed if one
> >> gets
> >>>>> large enough to require a flush.
> >>>>> (Does anyone remember why exactly that is? And do we still need
that
> >> now
> >>>>> that the memstoreTS is stored in the HFiles?)
> >>>>>
> >>>>>
> >>>>> So things are fine as long as all CFs have roughly the same size.
But
> >> if
> >>>>> you have one that gets a lot of data and many others that are
> smaller,
> >>>>> we'd end up with a lot of unnecessary and small store files from
the
> >>>>> smaller CFs.
> >>>>>
> >>>>> Anything else known that is bad about many column families?
> >>>>>
> >>>>>
> >>>>> -- Lars
> >>>>>
> >>>>>
> >>>>>
> >>>>> ________________________________
> >>>>> From: Andrew Purtell <apurtell@apache.org>
> >>>>> To: "user@hbase.apache.org" <user@hbase.apache.org>
> >>>>> Sent: Sunday, April 7, 2013 3:52 PM
> >>>>> Subject: Re: schema design: rows vs wide columns
> >>>>>
> >>>>> Is there a pointer to evidence/experiment backed analysis of this
> >>>>> question?
> >>>>> I'm sure there is some basis for this text in the book but I
> recommend
> >>> we
> >>>>> strike it. We could replace it with YCSB or LoadTestTool driven
> >> latency
> >>>>> graphs for different workloads maybe. Although that would also be
a
> >> big
> >>>>> simplification of 'schema design' considerations, it would not be
so
> >>>>> starkly lacking background.
> >>>>>
> >>>>> On Sunday, April 7, 2013, Ted Yu wrote:
> >>>>>
> >>>>>> From http://hbase.apache.org/book.html#number.of.cfs :
> >>>>>>
> >>>>>> HBase currently does not do well with anything above two or
three
> >>> column
> >>>>>> families so keep the number of column families in your schema
low.
> >>>>>>
> >>>>>> Cheers
> >>>>>>
> >>>>>> On Sun, Apr 7, 2013 at 3:04 PM, Stack <stack@duboce.net
> >> <javascript:;>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> On Sun, Apr 7, 2013 at 11:58 AM, Ted <yuzhihong@gmail.com
> >>>>>> <javascript:;>>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> With regard to number of column families, 3 is the recommended
> >>>>>> maximum.
> >>>>>>>>
> >>>>>>>
> >>>>>>> How did you come up w/ the number '3'?  Is it a 'hard' 3?
Or does
> >> it
> >>>>>>> depend?  If the latter, on what does it depend?
> >>>>>>> Thanks,
> >>>>>>> St.Ack
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Best regards,
> >>>>>
> >>>>>  - Andy
> >>>>>
> >>>>> Problems worthy of attack prove their worth by hitting back. - Piet
> >> Hein
> >>>>> (via Tom White)
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>
>
>


-- 
Adrien Mogenet
http://www.borntosegfault.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message