accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Schmidt (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4615) ThreadPool timeout when checking tserver stats is confusing
Date Thu, 22 Feb 2018 19:11:00 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373267#comment-16373267
] 

Jeff Schmidt commented on ACCUMULO-4615:
----------------------------------------

Sorry for the delay on this. I have an initial fix here: [https://github.com/jschmidt10/accumulo/commit/ce3ffae0e85f0b314af2401fd0dd054b51a51277]

I will be testing it on a deployed system shortly but any early feedback is appreciated too.

The general idea is to 

1) Use a timeout per status gathering task (instead of a timeout for the entire pool)
2) Changed the status gather results to a threadsafe data structure (ConcurrentSkipListMap)
3) Added separate property for the status timeout (per tserver)

> ThreadPool timeout when checking tserver stats is confusing
> -----------------------------------------------------------
>
>                 Key: ACCUMULO-4615
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4615
>             Project: Accumulo
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 1.8.1
>            Reporter: Michael Wall
>            Assignee: Jeff Schmidt
>            Priority: Minor
>             Fix For: 1.9.0, 2.0.0
>
>
> If it takes longer than the configured time to gather information from all the tablet
servers, the thread pool stops and processing continues with whatever has been collected.
 Code is https://github.com/apache/accumulo/blob/1.8/server/master/src/main/java/org/apache/accumulo/master/Master.java#L1120,
default timeout is 6s.  Does not appear to be an issue prior to 1.8.
> Best case, this was really confusing.  The monitor page would have 30 tservers, then
5 tservers.  Didn't really see any other negative effects, no migrations and no balancing
appeared to be affected.  Worse case though, I missed something and the master is making decisions
based on incomplete information.
> [~dlmarion@comcast.net] please add more info if needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message