hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gary Helmling (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-3553) number of active threads in HTable's ThreadPoolExecutor
Date Thu, 24 Feb 2011 21:27:38 GMT

     [ https://issues.apache.org/jira/browse/HBASE-3553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Gary Helmling updated HBASE-3553:

    Attachment: benchmark_results.txt

I tested this with an 8 slave cluster on EC2 to confirm the performance improvements.  After
added multi-get support to YCSB, I see a throughput increase of 30-50% and avg. latency reduction
of 22-34%, when correctly setting the thread pool size.  Some more detailed results are attached
for anyone interested.

One thing to note is that increasing the client YCSB threads from 16 to 32 actually decreased
the differential from the single threaded pool, so 8 worker threads x 32 HTable instances
was likely losing some performance due to client thread contention.  For clusters with highly
concurrent clients (like webapps), it may be advantageous to tune down the "hbase.htable.threads.max"
value from the default of "# of region servers".  A future improvement could be to allow use
of a shared, configurable thread pool as well.

> number of active threads in HTable's ThreadPoolExecutor
> -------------------------------------------------------
>                 Key: HBASE-3553
>                 URL: https://issues.apache.org/jira/browse/HBASE-3553
>             Project: HBase
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 0.90.1
>            Reporter: Himanshu Vashishtha
>             Fix For: 0.90.2
>         Attachments: ThreadPoolTester.java, benchmark_results.txt
> Using a ThreadPoolExecutor with corePoolSize = 0 and using LinkedBlockingQueue as the
collection to hold incoming runnable tasks seems to be having the effect of running only 1
thread, irrespective of the maxpoolsize set by reading the property hbase.htable.threads.max
(or number of RS). (This is what I infer from reading source code of ThreadPoolExecutor class
in 1.6)
> On a 3 node ec2 cluster, a full table scan with approx 9m rows results in almost similar
timing with a sequential scanner (240 secs) and scanning with a Coprocessor (230 secs), that
uses HTable's pool to  submit callable objects for each region. 
> I try to come up with a test class that creates a similar threadpool, and test that whether
the pool size ever grows beyond 1. It also confirms that it remains 1 though it executed 100
> It seems the desired behavior was to release all resources when the client is done reading,
but this can be achieved by setting allowCoreThreadTimeOut to true (after setting a +ve corePoolSize).

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message