accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Newton <eric.new...@gmail.com>
Subject Re: Review Request 27654: Add introspection of long running assignments
Date Fri, 07 Nov 2014 00:45:59 GMT
https://issues.apache.org/jira/browse/ACCUMULO-3311



On Thu, Nov 6, 2014 at 7:29 PM, Eric Newton <eric.newton@gmail.com> wrote:

> It would be nice to model "Danger!" messages with "All Clear!" directly.
>
> I'll make a ticket.
>
> On Thu, Nov 6, 2014 at 3:47 PM, Josh Elser <josh.elser@gmail.com> wrote:
>
>>
>>
>> > On Nov. 6, 2014, 5:47 p.m., kturner wrote:
>> > >
>> server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServerResourceManager.java,
>> line 250
>> > > <
>> https://reviews.apache.org/r/27654/diff/3/?file=751140#file751140line250>
>> > >
>> > >     The compaction code remembers when it logged an exception and
>> does not do it again.   It also logs a message if the compaction becomes
>> unstuck.  An advantage I thought of w/ repeatedly logging, is that you
>> could see the stack trace changing (or not).
>> > >
>> > >
>> > >     The stack trace is  a possible trace.  By the time logging
>> happens, the assignment could have completed and the thread could have
>> moved on to other things.
>> >
>> > Josh Elser wrote:
>> >     Yeah, since these are running fairly regularly (order of seconds) a
>> stuck assignment could get really spammy. Like you point out, there could
>> be value gained from printing out the stack more than once. Maybe I could
>> add some backoff which only warns so often?
>> >
>> >     bq. By the time logging happens, the assignment could have
>> completed and the thread could have moved on to other things.
>> >
>> >     Do you think the message should be updated to be more clear about
>> this? A "Maybe you should look into this" type message?
>> >
>> > kturner wrote:
>> >     > a stuck assignment could get really spammy
>> >
>> >     I think that spam is probably ok as long as the default is high
>> enough such that when it does happen, its something to be concerned about.
>> Could make the timer check a little less frequently.
>> >
>> >     > Do you think the message should be updated to be more clear about
>> this?
>> >
>> >     I think compaction code just says its a possible stack trace.   I
>> suppose a good solution would be to have error codes, then user can look up
>> error code and get nitty gritty details.  Can't really put too much info in
>> log message.
>> >
>> > Josh Elser wrote:
>> >     bq. Could make the timer check a little less frequently.
>> >
>> >     As long as we have a long threshold for warning about a stuck
>> assignment, we can easily make a longer period on the timer. The timer
>> period dictates the minimum stuck assignment time -- I can update the
>> description with a clarification.
>> >
>> > kturner wrote:
>> >     I was thinking that once an assignment is considered stuck, that
>> each time the timer kicks a check (I think its either 5 secs or 1 sec, not
>> sure) that it will cause a spam.  Was thinking this could be increased to
>> produce less spam.  The period of the timer could be a function of
>> tserver.assignment.duration.warning, like 1/4 or 1/2.
>>
>> bq. The period of the timer could be a function of
>> tserver.assignment.duration.warning, like 1/4 or 1/2.
>>
>> That would work, unless the user changed the value of the duration
>> warning. It would still fire at the old period (unless I'm much trickier
>> about scheduling the task to run).
>>
>> Regardless need to think some more about preventing spam.
>>
>>
>> - Josh
>>
>>
>> -----------------------------------------------------------
>> This is an automatically generated e-mail. To reply, visit:
>> https://reviews.apache.org/r/27654/#review60185
>> -----------------------------------------------------------
>>
>>
>> On Nov. 6, 2014, 12:58 a.m., Josh Elser wrote:
>> >
>> > -----------------------------------------------------------
>> > This is an automatically generated e-mail. To reply, visit:
>> > https://reviews.apache.org/r/27654/
>> > -----------------------------------------------------------
>> >
>> > (Updated Nov. 6, 2014, 12:58 a.m.)
>> >
>> >
>> > Review request for accumulo.
>> >
>> >
>> > Bugs: ACCUMULO-3304
>> >     https://issues.apache.org/jira/browse/ACCUMULO-3304
>> >
>> >
>> > Repository: accumulo
>> >
>> >
>> > Description
>> > -------
>> >
>> > Watches assignments and reports when an assignment is running for
>> longer than a configured time.
>> >
>> >
>> > Diffs
>> > -----
>> >
>> >   core/src/main/java/org/apache/accumulo/core/conf/Property.java 56f3d9c
>> >
>>  server/tserver/src/main/java/org/apache/accumulo/tserver/ActiveAssignmentRunnable.java
>> PRE-CREATION
>> >
>>  server/tserver/src/main/java/org/apache/accumulo/tserver/RunnableStartedAt.java
>> PRE-CREATION
>> >
>>  server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java
>> 94be0bb
>> >
>>  server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServerResourceManager.java
>> 935ffeb
>> >
>> > Diff: https://reviews.apache.org/r/27654/diff/
>> >
>> >
>> > Testing
>> > -------
>> >
>> > Very minimal.
>> >
>> >
>> > Thanks,
>> >
>> > Josh Elser
>> >
>> >
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message