hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aji Janis <aji1...@gmail.com>
Subject Re: When to expand vertically vs. horizontally in Hbase
Date Fri, 05 Jul 2013 13:53:47 GMT

 I am using the Genre/Author stuff as an example but yes at the moment I
only have 5 column families. However, over time I may have more (no upper
limit decided that this point). See below for more responses

On Wed, Jul 3, 2013 at 3:42 PM, Asaf Mesika <asaf.mesika@gmail.com> wrote:

> Do you have only 5 static author names?
> Keep in mind the column family name is defined when creating the table.
> Regarding tall vs wide debate:
> HBase is first and for most a Key Value database thus reads and writes in
> the column-value level. So it doesn't really care about rows.
> But it's not entirely true. Rows come into play in the following
> situations:
> Splitting a region is per row and not per column, thus a row will be saved
> as a whole on a region. If you have a really large row, the region size
> granularity is dependent on it. It doesn't seem to be the case here.
> Put/Delete creates a lock until finished. If you are intensive on inserts
> to the same row at the same time, thus might be bad for you, keeping your
> rows slimmer can reduce contention, but again, only if you make a lot
> concurrent modifications to the same row.

I expect batches of Put/Delete to the same row to happen by at most one
thread at a time based on user's current behavior. So locking shouldn't be
an issue. However, not sure if the saving row to a region with enough space
topic is really an issue I need to worry about (probably because I just
don't know much about it yet).

> Filtering - if you need a filter which need all the row (there is a method
> you override in Filter to mark that) than a far row will be more memory
> intensive. If you needed only 1/5 of your row, than maybe splitting it to 5
> rows to begin with would have made a better schema design in terms of
> memory and I/O.

Currently, my access pattern is to get all data for a given row. Its
possible in the future we may want to apply (family/qualifier) filters.
There is a lot of uncertainty on use cases (client side) at this point
which is why I am not entirely sure on how things will look months from
now. I am not sure I follow this statement

"if you need a filter which need all the row (there is a method you
override in Filter to mark that) than a far row will be more memory

Can you please explain? Thank you for these suggestions btw, good food for

> On Wednesday, July 3, 2013, Aji Janis wrote:
> > I have a major typo in the question so I apologize. I meant to say 5
> > families with 1000+ qualifiers each.
> >
> > Lets work with an example, (not the greatest example here but still).
> Lets
> > say we have a Genre Class like this:
> >
> > Class HistoryBooks{
> >
> >  ArrayList<Books> author1;
> >  ArrayList<Books> author2;
> >  ArrayList<Books> author3;
> >  ArrayList<Books> author4;
> >  ArrayList<Books> author5;
> >
> > ...}
> >
> > Each author is a column family (lets say we only allow 5 authors per
> > <T>Book class. Book per author ends up being the qualifier. In this
> case, I
> > know I have a max family count but my qualifiers have no upper limit. So
> is
> > this scenario a case for tall or wide table? Why? Thank you.
> >
> >
> > On Tue, Jul 2, 2013 at 9:56 AM, Bryan Beaudreault
> > <bbeaudreault@hubspot.com <javascript:;>>wrote:
> >
> > > If they are accessed mostly together they should all be a single column
> > > family. The key with tall or wide is based on the total byte size of
> each
> > > KeyValue. Your cells would need to be quite large for 50 to become a
> > > problem. I still would recommend using a single CF though.
> > > —
> > > Sent from iPhone

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message