hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: Hbase and linear scaling with small write intensive clusters
Date Tue, 22 Sep 2009 22:39:27 GMT
Split your table in advance?  You can do it from the UI or shell (Script
it?)

Regards same performance for 10 nodes as for 5 nodes, how many regions in
your table?  What happens if you pile on more data?

The split algorithm will be sped up in coming versions for sure.  Two
minutes seems like a long time.   Its under load at this time?

St.Ack



On Tue, Sep 22, 2009 at 3:14 PM, Molinari, Guy <Guy.Molinari@disney.com>wrote:

> Hello all,
>
>     I've been working with HBase for the past few months on a proof of
> concept/technology adoption evaluation.    I wanted to describe my
> scenario to the user/development community to get some input on my
> observations.
>
>
>
> I've written an application that is comprised of two tables.  It models
> a classic many-to-many relationship.   One table stores "User" data and
> the other represents an "Inbox" of items assigned to that user.    The
> key for the user is a string generated by the JDK's UUID.randomUUID()
> method.   The key for the "Inbox" is a monotonically increasing value.
>
>
>
> It works just fine.   I've reviewed the performance tuning info on the
> HBase WIKI page.   The client application spins up 100 threads each one
> grabbing a range of keys (for the "Inbox").    The I/O mix is about
> 50/50 read/write.   The test client inserts 1,000,000 "Inbox" items and
> verifies the existence of a "User" (FK check).   It uses column families
> to maintain integrity of the relationships.
>
>
>
> I'm running versions 0.19.3 and 0.20.0.    The behavior is basically the
> same.   The cluster consists of 10 nodes.   I'm running my namenode and
> HBase master on one dedicated box.   The other 9 run datanodes/region
> servers.
>
>
>
> I'm seeing around ~1000 "Inbox" transactions per second (dividing total
> time for the batch by total count inserted).    The problem is that I
> get the same results with 5 nodes as with 10.    Not quite what I was
> expecting.
>
>
>
> The bottleneck seems to be the splitting algorithms.   I've set my
> region size to 2M.   I can see that as the process moves forward, HBase
> pauses and re-distributes the data and splits regions.   It does this
> first for the "Inbox" table and then pauses again and redistributes the
> "User" table.    This pause can be quite long.   Often 2 minutes or
> more.
>
>
>
> Can the key ranges be pre-defined somehow in advance to avoid this?   I
> would rather not burden application developers/DBA's with this.
> Perhaps the divvy algorithms could be sped up?   Any configuration
> recommendations?
>
>
>
> Thanks in advance,
>
> Guy
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message