accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser" <josh.el...@gmail.com>
Subject Re: Review Request 27654: Add introspection of long running assignments
Date Thu, 06 Nov 2014 20:47:58 GMT


> On Nov. 6, 2014, 5:47 p.m., kturner wrote:
> > server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServerResourceManager.java,
line 250
> > <https://reviews.apache.org/r/27654/diff/3/?file=751140#file751140line250>
> >
> >     The compaction code remembers when it logged an exception and does not do it
again.   It also logs a message if the compaction becomes unstuck.  An advantage I thought
of w/ repeatedly logging, is that you could see the stack trace changing (or not).
> >     
> >     
> >     The stack trace is  a possible trace.  By the time logging happens, the assignment
could have completed and the thread could have moved on to other things.
> 
> Josh Elser wrote:
>     Yeah, since these are running fairly regularly (order of seconds) a stuck assignment
could get really spammy. Like you point out, there could be value gained from printing out
the stack more than once. Maybe I could add some backoff which only warns so often?
>     
>     bq. By the time logging happens, the assignment could have completed and the thread
could have moved on to other things.
>     
>     Do you think the message should be updated to be more clear about this? A "Maybe
you should look into this" type message?
> 
> kturner wrote:
>     > a stuck assignment could get really spammy
>     
>     I think that spam is probably ok as long as the default is high enough such that
when it does happen, its something to be concerned about.  Could make the timer check a little
less frequently.
>     
>     > Do you think the message should be updated to be more clear about this?
>     
>     I think compaction code just says its a possible stack trace.   I suppose a good
solution would be to have error codes, then user can look up error code and get nitty gritty
details.  Can't really put too much info in log message.
> 
> Josh Elser wrote:
>     bq. Could make the timer check a little less frequently.
>     
>     As long as we have a long threshold for warning about a stuck assignment, we can
easily make a longer period on the timer. The timer period dictates the minimum stuck assignment
time -- I can update the description with a clarification.
> 
> kturner wrote:
>     I was thinking that once an assignment is considered stuck, that each time the timer
kicks a check (I think its either 5 secs or 1 sec, not sure) that it will cause a spam.  Was
thinking this could be increased to produce less spam.  The period of the timer could be a
function of tserver.assignment.duration.warning, like 1/4 or 1/2.

bq. The period of the timer could be a function of tserver.assignment.duration.warning, like
1/4 or 1/2.

That would work, unless the user changed the value of the duration warning. It would still
fire at the old period (unless I'm much trickier about scheduling the task to run).

Regardless need to think some more about preventing spam.


- Josh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27654/#review60185
-----------------------------------------------------------


On Nov. 6, 2014, 12:58 a.m., Josh Elser wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27654/
> -----------------------------------------------------------
> 
> (Updated Nov. 6, 2014, 12:58 a.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-3304
>     https://issues.apache.org/jira/browse/ACCUMULO-3304
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Watches assignments and reports when an assignment is running for longer than a configured
time.
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/accumulo/core/conf/Property.java 56f3d9c 
>   server/tserver/src/main/java/org/apache/accumulo/tserver/ActiveAssignmentRunnable.java
PRE-CREATION 
>   server/tserver/src/main/java/org/apache/accumulo/tserver/RunnableStartedAt.java PRE-CREATION

>   server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java 94be0bb

>   server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServerResourceManager.java
935ffeb 
> 
> Diff: https://reviews.apache.org/r/27654/diff/
> 
> 
> Testing
> -------
> 
> Very minimal.
> 
> 
> Thanks,
> 
> Josh Elser
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message