hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: HBase Client Performance Bottleneck in a Single Virtual Machine
Date Mon, 04 Nov 2013 20:11:11 GMT
You might try asynchbase Michael.

On Mon, Nov 4, 2013 at 11:00 AM, <Michael.Grundvig@high5games.com> wrote:

> Not yet, this is just a load test client. It literally does nothing but
> create threads to talk to HBase and run 4 different calls. Nothing else is
> done in the app at all.
> To eliminate even more of our code from the loop, we just tried removing
> our connection pool entirely and just using a single connection per thread
> - no improvement. Then we tried creating the HTableInterface (all calls are
> against the same table) at the time of connection creation. The means
> thread to connection to table interface were all at 1 to 1 and not being
> passed around. No performance improvement.
> Long story short, running a single thread it's fast. Start multithreading,
> it starts slowing down. CPU usage, memory usage, etc. are all negligible.
> The performance isn't terrible - it's probably good enough for the vast
> majority of users, but it's not good enough for our app. With one thread,
> it might take 5 milliseconds. With 10 threads all spinning more quickly (40
> milliseconds delay), the call time increases to 15-30 milliseconds. The
> problem is that at our throughput rates, that's a serious concern.
> We are going to fire up a profiler next to see what we can find.
> -Mike
> -----Original Message-----
> From: Vladimir Rodionov [mailto:vrodionov@carrieriq.com]
> Sent: Monday, November 04, 2013 12:50 PM
> To: user@hbase.apache.org
> Subject: RE: HBase Client Performance Bottleneck in a Single Virtual
> Machine
> Michael, have you tried jstack on your client application?
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
> ________________________________________
> From: Michael.Grundvig@high5games.com [Michael.Grundvig@high5games.com]
> Sent: Sunday, November 03, 2013 7:46 PM
> To: user@hbase.apache.org
> Subject: HBase Client Performance Bottleneck in a Single Virtual Machine
> Hi all; I posted this as a question on StackOverflow as well but realized
> I should have gone straight ot the horses-mouth with my question. Sorry for
> the double post!
> We are running a series of HBase tests to see if we can migrate one of our
> existing datasets from a RDBMS to HBase. We are running 15 nodes with 5
> zookeepers and HBase 0.94.12 for this test.
> We have a single table with three column families and a key that is
> distributing very well across the cluster. All of our queries are running a
> direct look-up; no searching or scanning. Since the HTablePool is now
> frowned upon, we are using the Apache commons pool and a simple connection
> factory to create a pool of connections and use them in our threads. Each
> thread creates an HTableInstance as needed and closes it when done. There
> are no leaks we can identify.
> If we run a single thread and just do lots of random calls sequentially,
> the performance is quite good. Everything works great until we start trying
> to scale the performance. As we add more threads and try and get more work
> done in a single VM, we start seeing performance degrade quickly. The
> client code is simply attempting to run either one of several gets or a
> single put at a given frequency. It then waits until the next time to run,
> we use this to simulate the workload from external clients. With a single
> thread, we will see call times in the 2-3 milliseconds which is acceptable.
> As we add more threads, this call time starts increasing quickly. What
> gets strange is if we add more VMs, the times hold steady across them all
> so clearly it's a bottleneck in the running instance and not the cluster.
> We can get a huge amount of processing happening across the cluster very
> easily - it just has to use a lot of VMs on the client side to do it. We
> know the contention isn't in the connection pool as we see the problem even
> when we have more connections than threads. Unfortunately, the times are
> spiraling out of control very quickly. We need it to support at least 128
> threads in practice, but most important I want to support 500 updates/sec
> and 250 gets/sec. In theory, this should be a piece of cake for the cluster
> as we can do FAR more work than that with a few VMs, but we don't even get
> close to this with a single VM.
> So my question: how do people building high-performance apps with HBase
> get around this? What approach are others using for connection pooling in a
> multi-threaded environment? There seems to be a surprisingly little amount
> of info about this on the web considering the popularity. Is there some
> client setting we need to use that makes it perform better in a threaded
> environment? We are going to try to cache HTable instances next but that's
> a total guess. There are solutions to offloading work to other VMs but we
> really want to avoid this as clearly the cluster can handle the load and it
> will dramatically decrease the application performance in critical areas.
> Any help is greatly appreciated! Thanks!
> -Mike
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message