hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anoop John <anoop.hb...@gmail.com>
Subject Re: HBase Client Performance Bottleneck in a Single Virtual Machine
Date Mon, 04 Nov 2013 05:58:25 GMT
If you have used con.getTable() the close on HTable wont close the
underlying connection

-Anoop-

On Mon, Nov 4, 2013 at 11:21 AM, <Michael.Grundvig@high5games.com> wrote:

> Our current usage is how I would do this in a typical database app with
> table acting like a statement. It looks like this:
>
> Connection connection = null;
> HTableInterface table = null;
> try {
>         connection = pool.acquire();
>         table = connection.getTable(tableName);
>         // Do work
> } finally {
>         table.close();
>         pool.release(connection);
> }
>
> Is this incorrect? The API docs says close " Releases any resources held
> or pending changes in internal buffers." I didn't interpret that as having
> it close the underlying connection. Thanks!
>
> -Mike
>
> -----Original Message-----
> From: Sriram Ramachandrasekaran [mailto:sri.rams85@gmail.com]
> Sent: Sunday, November 03, 2013 11:43 PM
> To: user@hbase.apache.org
> Cc: larsh@apache.org
> Subject: Re: HBase Client Performance Bottleneck in a Single Virtual
> Machine
>
> Hey Michael - Per API documentation, closing the HTable Instance would
> close the underlying resources too. Hope you are aware of it.
>
>
> On Mon, Nov 4, 2013 at 11:06 AM, <Michael.Grundvig@high5games.com> wrote:
>
> > Hi Lars, at application startup the pool is created with X number of
> > connections using the first method you indicated:
> > HConnectionManager.createConnection(conf). We store each connection in
> > the pool automatically and serve it up to threads as they request it.
> > When a thread is done using the connection, they return it back to the
> > pool. The connections are not be created and closed per thread, but
> > only once for the entire application. We are using the
> > GenericObjectPool from Apache Commons Pooling as the foundation of our
> > connection pooling approach. Our entire pool implementation really
> > consists of just a couple overridden methods to specify how to create
> > a new connection and close it. The GenericObjectPool class does all the
> rest. See here for details:
> > http://commons.apache.org/proper/commons-pool/
> >
> > Each thread is getting a HTableInstance as needed and then closing it
> > when done. The only thing we are not doing is using the
> > createConnection method that takes in an ExecutorService as that
> > wouldn't work in our model. Our app is like a web application - the
> > thread pool is managed outside the scope of our application code so we
> > can't assume the service is available at connection creation time.
> Thanks!
> >
> > -Mike
> >
> >
> > -----Original Message-----
> > From: lars hofhansl [mailto:larsh@apache.org]
> > Sent: Sunday, November 03, 2013 11:27 PM
> > To: user@hbase.apache.org
> > Subject: Re: HBase Client Performance Bottleneck in a Single Virtual
> > Machine
> >
> > Hi Micheal,
> >
> > can you try to create a single HConnection in your client:
> > HConnectionManager.createConnection(Configuration conf) or
> > HConnectionManager.createConnection(Configuration conf,
> > ExecutorService
> > pool)
> >
> > Then use HConnection.getTable(...) each time you need to do an operation.
> >
> > I.e.
> > Configuration conf = ...;
> > ExecutorService pool = ...;
> > // create a single HConnection for you vm.
> > HConnection con = HConnectionManager.createConnection(Configuration
> > conf, ExecutorService pool); // reuse the connection for many tables,
> > even in different threads HTableInterface table = con.getTable(...);
> > // use table even for only a few operation.
> > table.close();
> > ...
> > HTableInterface table = con.getTable(...); // use table even for only
> > a few operation.
> > table.close();
> > ...
> > // at the end close the connection
> > con.close();
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> >  From: "Michael.Grundvig@high5games.com"
> > <Michael.Grundvig@high5games.com>
> > To: user@hbase.apache.org
> > Sent: Sunday, November 3, 2013 7:46 PM
> > Subject: HBase Client Performance Bottleneck in a Single Virtual
> > Machine
> >
> >
> > Hi all; I posted this as a question on StackOverflow as well but
> > realized I should have gone straight ot the horses-mouth with my
> > question. Sorry for the double post!
> >
> > We are running a series of HBase tests to see if we can migrate one of
> > our existing datasets from a RDBMS to HBase. We are running 15 nodes
> > with 5 zookeepers and HBase 0.94.12 for this test.
> >
> > We have a single table with three column families and a key that is
> > distributing very well across the cluster. All of our queries are
> > running a direct look-up; no searching or scanning. Since the
> > HTablePool is now frowned upon, we are using the Apache commons pool
> > and a simple connection factory to create a pool of connections and
> > use them in our threads. Each thread creates an HTableInstance as
> > needed and closes it when done. There are no leaks we can identify.
> >
> > If we run a single thread and just do lots of random calls
> > sequentially, the performance is quite good. Everything works great
> > until we start trying to scale the performance. As we add more threads
> > and try and get more work done in a single VM, we start seeing
> > performance degrade quickly. The client code is simply attempting to
> > run either one of several gets or a single put at a given frequency.
> > It then waits until the next time to run, we use this to simulate the
> > workload from external clients. With a single thread, we will see call
> times in the 2-3 milliseconds which is acceptable.
> >
> > As we add more threads, this call time starts increasing quickly. What
> > gets strange is if we add more VMs, the times hold steady across them
> > all so clearly it's a bottleneck in the running instance and not the
> cluster.
> > We can get a huge amount of processing happening across the cluster
> > very easily - it just has to use a lot of VMs on the client side to do
> > it. We know the contention isn't in the connection pool as we see the
> > problem even when we have more connections than threads.
> > Unfortunately, the times are spiraling out of control very quickly. We
> > need it to support at least 128 threads in practice, but most
> > important I want to support 500 updates/sec and 250 gets/sec. In
> > theory, this should be a piece of cake for the cluster as we can do
> > FAR more work than that with a few VMs, but we don't even get close to
> this with a single VM.
> >
> > So my question: how do people building high-performance apps with
> > HBase get around this? What approach are others using for connection
> > pooling in a multi-threaded environment? There seems to be a
> > surprisingly little amount of info about this on the web considering
> > the popularity. Is there some client setting we need to use that makes
> > it perform better in a threaded environment? We are going to try to
> > cache HTable instances next but that's a total guess. There are
> > solutions to offloading work to other VMs but we really want to avoid
> > this as clearly the cluster can handle the load and it will dramatically
> decrease the application performance in critical areas.
> >
> > Any help is greatly appreciated! Thanks!
> > -Mike
> >
>
>
>
> --
> It's just about how deep your longing is!
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message