hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aji Janis <aji1...@gmail.com>
Subject Re: When to expand vertically vs. horizontally in Hbase
Date Fri, 05 Jul 2013 16:16:42 GMT
I understand that there shouldn't be unlimited number of column families. I
am using this example on purpose to see how it comes into play.


On Fri, Jul 5, 2013 at 12:07 PM, Michael Segel <michael_segel@hotmail.com>wrote:

> Why do you have so many column families (CF) ?
>
> Its not a question on the physical limitations, but more on the issue of
> data design.
>
> There aren't that many really good examples of where you would have
> multiple column families that would require more than a handful of CFs.
>
> When I teach or lecture, the example I use is an order entry system.
>  Where you would have the same key on Order entry, pick slips, shipping,
> and invoice.
>
> That's probably the best example of where CFs come in to play.
>
> I'd suggest that you go back and rethink the design if you're having more
> than a handful.
>
>
>
> On Jul 5, 2013, at 8:53 AM, Aji Janis <aji1705@gmail.com> wrote:
>
> > Asaf,
> >
> > I am using the Genre/Author stuff as an example but yes at the moment I
> > only have 5 column families. However, over time I may have more (no upper
> > limit decided that this point). See below for more responses
> >
> >
> > On Wed, Jul 3, 2013 at 3:42 PM, Asaf Mesika <asaf.mesika@gmail.com>
> wrote:
> >
> >> Do you have only 5 static author names?
> >> Keep in mind the column family name is defined when creating the table.
> >>
> >> Regarding tall vs wide debate:
> >> HBase is first and for most a Key Value database thus reads and writes
> in
> >> the column-value level. So it doesn't really care about rows.
> >> But it's not entirely true. Rows come into play in the following
> >> situations:
> >> Splitting a region is per row and not per column, thus a row will be
> saved
> >> as a whole on a region. If you have a really large row, the region size
> >> granularity is dependent on it. It doesn't seem to be the case here.
> >> Put/Delete creates a lock until finished. If you are intensive on
> inserts
> >> to the same row at the same time, thus might be bad for you, keeping
> your
> >> rows slimmer can reduce contention, but again, only if you make a lot
> >> concurrent modifications to the same row.
> >>
> >
> > I expect batches of Put/Delete to the same row to happen by at most one
> > thread at a time based on user's current behavior. So locking shouldn't
> be
> > an issue. However, not sure if the saving row to a region with enough
> space
> > topic is really an issue I need to worry about (probably because I just
> > don't know much about it yet).
> >
> >
> >> Filtering - if you need a filter which need all the row (there is a
> method
> >> you override in Filter to mark that) than a far row will be more memory
> >> intensive. If you needed only 1/5 of your row, than maybe splitting it
> to 5
> >> rows to begin with would have made a better schema design in terms of
> >> memory and I/O.
> >>
> >
> > Currently, my access pattern is to get all data for a given row. Its
> > possible in the future we may want to apply (family/qualifier) filters.
> > There is a lot of uncertainty on use cases (client side) at this point
> > which is why I am not entirely sure on how things will look months from
> > now. I am not sure I follow this statement
> >
> > "if you need a filter which need all the row (there is a method you
> > override in Filter to mark that) than a far row will be more memory
> > intensive."
> >
> > Can you please explain? Thank you for these suggestions btw, good food
> for
> > thought!
> >
> >
> >>
> >> On Wednesday, July 3, 2013, Aji Janis wrote:
> >>
> >>> I have a major typo in the question so I apologize. I meant to say 5
> >>> families with 1000+ qualifiers each.
> >>>
> >>> Lets work with an example, (not the greatest example here but still).
> >> Lets
> >>> say we have a Genre Class like this:
> >>>
> >>> Class HistoryBooks{
> >>>
> >>> ArrayList<Books> author1;
> >>> ArrayList<Books> author2;
> >>> ArrayList<Books> author3;
> >>> ArrayList<Books> author4;
> >>> ArrayList<Books> author5;
> >>>
> >>> ...}
> >>>
> >>> Each author is a column family (lets say we only allow 5 authors per
> >>> <T>Book class. Book per author ends up being the qualifier. In this
> >> case, I
> >>> know I have a max family count but my qualifiers have no upper limit.
> So
> >> is
> >>> this scenario a case for tall or wide table? Why? Thank you.
> >>>
> >>>
> >>> On Tue, Jul 2, 2013 at 9:56 AM, Bryan Beaudreault
> >>> <bbeaudreault@hubspot.com <javascript:;>>wrote:
> >>>
> >>>> If they are accessed mostly together they should all be a single
> column
> >>>> family. The key with tall or wide is based on the total byte size of
> >> each
> >>>> KeyValue. Your cells would need to be quite large for 50 to become a
> >>>> problem. I still would recommend using a single CF though.
> >>>> —
> >>>> Sent from iPhone
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message