hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: schema design: rows vs wide columns
Date Tue, 16 Apr 2013 14:04:44 GMT
Can we add more details than just changing the maximum CF number? Maybe we
can explain why there is some impacts, or what to consider?

JM

2013/4/16 Ted Yu <yuzhihong@gmail.com>

> If there is no objection, I will create a JIRA to increase the maximum
> number of column families described here:
>
> http://hbase.apache.org/book.html#number.of.cfs
>
> Cheers
>
> On Mon, Apr 8, 2013 at 7:21 AM, Doug Meil <doug.meil@explorysmedical.com
> >wrote:
>
> >
> >
> > For the record, the refGuide mentions potential issues of CF lumpiness
> > that you mentioned:
> >
> > http://hbase.apache.org/book.html#number.of.cfs
> >
> >
> > 6.2.1. Cardinality of ColumnFamilies
> >
> > Where multiple ColumnFamilies exist in a single table, be aware of the
> > cardinality (i.e., number of rows).
> >       If ColumnFamilyA has 1 million rows and ColumnFamilyB has 1 billion
> > rows, ColumnFamilyA's data will likely be spread
> >       across many, many regions (and RegionServers).  This makes mass
> > scans for ColumnFamilyA less efficient.
> >
> >
> >
> >
> >
> > Š. anything that needs to be updated/added for this?
> >
> >
> >
> >
> >
> > On 4/8/13 12:39 AM, "lars hofhansl" <larsh@apache.org> wrote:
> >
> > >I think the main problem is that all CFs have to be flushed if one gets
> > >large enough to require a flush.
> > >(Does anyone remember why exactly that is? And do we still need that now
> > >that the memstoreTS is stored in the HFiles?)
> > >
> > >
> > >So things are fine as long as all CFs have roughly the same size. But if
> > >you have one that gets a lot of data and many others that are smaller,
> > >we'd end up with a lot of unnecessary and small store files from the
> > >smaller CFs.
> > >
> > >Anything else known that is bad about many column families?
> > >
> > >
> > >-- Lars
> > >
> > >
> > >
> > >________________________________
> > > From: Andrew Purtell <apurtell@apache.org>
> > >To: "user@hbase.apache.org" <user@hbase.apache.org>
> > >Sent: Sunday, April 7, 2013 3:52 PM
> > >Subject: Re: schema design: rows vs wide columns
> > >
> > >Is there a pointer to evidence/experiment backed analysis of this
> > >question?
> > >I'm sure there is some basis for this text in the book but I recommend
> we
> > >strike it. We could replace it with YCSB or LoadTestTool driven latency
> > >graphs for different workloads maybe. Although that would also be a big
> > >simplification of 'schema design' considerations, it would not be so
> > >starkly lacking background.
> > >
> > >On Sunday, April 7, 2013, Ted Yu wrote:
> > >
> > >> From http://hbase.apache.org/book.html#number.of.cfs :
> > >>
> > >> HBase currently does not do well with anything above two or three
> column
> > >> families so keep the number of column families in your schema low.
> > >>
> > >> Cheers
> > >>
> > >> On Sun, Apr 7, 2013 at 3:04 PM, Stack <stack@duboce.net<javascript:;>>
> > >> wrote:
> > >>
> > >> > On Sun, Apr 7, 2013 at 11:58 AM, Ted <yuzhihong@gmail.com
> > >><javascript:;>>
> > >> wrote:
> > >> >
> > >> > > With regard to number of column families, 3 is the recommended
> > >>maximum.
> > >> > >
> > >> >
> > >> > How did you come up w/ the number '3'?  Is it a 'hard' 3? Or does
it
> > >> > depend?  If the latter, on what does it depend?
> > >> > Thanks,
> > >> > St.Ack
> > >> >
> > >>
> > >
> > >
> > >--
> > >Best regards,
> > >
> > >   - Andy
> > >
> > >Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > >(via Tom White)
> >
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message