Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Thu, 8 Oct 2015 08:18:26 +0000 (UTC)
From: "Nicolas Liochon (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12729716.1406289986000.54543.1444292306730@Atlassian.JIRA>
In-Reply-To: <JIRA.12729716.1406289986000@Atlassian.JIRA>
References: <JIRA.12729716.1406289986000@Atlassian.JIRA>
 <JIRA.12729716.1406289986038@arcas>
Subject: [jira] [Commented] (HBASE-11590) use a specific ThreadPoolExecutor
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948255#comment-14948255 ] 

Nicolas Liochon commented on HBASE-11590:
-----------------------------------------

Hey [~saint.ack@gmail.com]

Attached some tests comparing ThreadPoolExecutor (the one we use currently), ForkJoinPool (available in jdk1.7+) and LifoThreadPoolExecutorSQP (the one mentionned in the stackoverflow discussion) .

- the critical use case is:
   1) do a table.batch(puts) that needs a lot of threads
   2) then do a loop { table.get(get) }, this needs a single thread but each call may use any of the threads in the pool, resetting the keepalive timeout => they may never expire.
ThreadPoolExecutor is actually worse it tries to create a thread even if there are already enough threads available.

 See the code for the details, but here is the interesting case with a thread pools of 1000 threads while we need only 1 thread.
{quote}
   * ForkJoinPool maxThread=1000, immediateGet=true, LOOP=2000000
   * ForkJoinPool total=68942ms
   * ForkJoinPool step1=68657ms
   * ForkJoinPool step2=284ms
   * ForkJoinPool threads: 6, 1006, 456, 6  <=== we have 456 threads instead of the ideal 7

   * ThreadPoolExecutor maxThread=1000, immediateGet=true, LOOP=2000000
   * ThreadPoolExecutor total=107449ms <=== very slow
   * ThreadPoolExecutor step1=107145ms
   * ThreadPoolExecutor step2=304ms
   * ThreadPoolExecutor threads: 6, 1006, 889, 6 <== keeps nearly all  the threads -
 
   * LifoThreadPoolExecutorSQP maxThread=1000, immediateGet=true, LOOP=2000000
   * LifoThreadPoolExecutorSQP total=4805ms <================ quite fast
   * LifoThreadPoolExecutorSQP step1=4803ms
   * LifoThreadPoolExecutorSQP step2=1ms
   * LifoThreadPoolExecutorSQP threads: 6, 248, 8, 6 <====================== removes the threads quickly
{quote}

You may want to rerun the tests to see if you reproduce them. I included my results in the code.

- The root issue is that we need a LIFO poll/lock but it does not exists.
- LifoThreadPoolExecutorSQP solves this with a LIFO queues for the threads waiting for work. But it
 comes with a LGPL license, and the code is not trivial. A bug there could be difficult to find. It
  is however incredible to see how faster/better it is compared to the other pools.
- ForkJoinPool is better then TPE. It's not as good as LifoThreadPoolExecutorSQP, but it's much
 closer to what we need. It's available in the JDK 1.7 it looks like a safe bet for HBase 1.+
 ForkJoinPool: threads are created only if there are waiting tasks. They expire after 2seconds (it's
  hardcoded in the jdk code). They are not LIFO, and the task allocation is not as fast as the one in LifoThreadPoolExecutorSQP.

=> Proposition: Let's migrate to ForkJoinPool. If someone has time to try LifoThreadPoolExecutorSQP it can be interesting in the future (if the license can be changed)...

> use a specific ThreadPoolExecutor
> ---------------------------------
>
>                 Key: HBASE-11590
>                 URL: https://issues.apache.org/jira/browse/HBASE-11590
>             Project: HBase
>          Issue Type: Bug
>          Components: Client, Performance
>    Affects Versions: 1.0.0, 2.0.0
>            Reporter: Nicolas Liochon
>            Assignee: Nicolas Liochon
>            Priority: Minor
>             Fix For: 2.0.0
>
>         Attachments: tp.patch
>
>
> The JDK TPE creates all the threads in the pool. As a consequence, we create (by default) 256 threads even if we just need a few.
> The attached TPE create threads only if we have something in the queue.
> On a PE test with replica on, it improved the 99 latency percentile by 5%. 
> Warning: there are likely some race conditions, but I'm posting it here because there is may be an implementation available somewhere we can use, or a good reason not to do that. So feedback welcome as usual. 


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)