hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <la...@apache.org>
Subject Re: master unhealthy issue in JitterScheduledThreadPoolExecutorImpl, or is it just me?
Date Sat, 05 Dec 2015 07:21:39 GMT
Commented on HBASE-14922, which introduces this class, along with a proposed fix.
Thanks.
-- Lars
      From: "larsh@apache.org" <larsh@apache.org>
 To: HBase Dev List <dev@hbase.apache.org>; Elliott Clark <eclark@apache.org>

 Sent: Friday, December 4, 2015 11:14 PM
 Subject: master unhealthy issue in JitterScheduledThreadPoolExecutorImpl, or is it just me?
   
I see that locally all tests that start a mini cluster fail.
In the log I see 1000's of messages like these:2015-12-04 22:55:48,215 ERROR [newbunny,41236,1449298547569_ChoreService_107]
se
rver.NIOServerCnxnFactory$1(44): Thread Thread[newbunny,41236,1449298547569_ChoreService_107,5,main]
died
java.lang.IllegalArgumentException: bound must be greater than origin
        at java.util.concurrent.ThreadLocalRandom.nextLong(ThreadLocalRandom.java:430)
        at org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.getDelay(JitterScheduledThreadPoolExecutorImpl.java:84)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1083)
        at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

In JitteredRunnableScheduledFuture.getDelay I see this.
      long baseDelay = wrapped.getDelay(unit);
      long spreadTime = (long) (baseDelay * spread);
      long delay = baseDelay + ThreadLocalRandom.current().nextLong(-spreadTime, spreadTime);

So this can fail when spreadTime is 0 (or negative).I suppose to fix is simple not add the
spread if spreadTime if <= 0. And it indeed this fixes the problem for me.

Elliot, you just added that class, mind having a look? Or I'll just file a jira.

Thanks.
-- Lars

 
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message