accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (Commented) (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-449) Failed log copy is not restarted
Date Wed, 07 Mar 2012 00:31:58 GMT


Eric Newton commented on ACCUMULO-449:

It does restart, but it takes a long time to timeout (an hour?!?).  We need to use an API
to get the status from the logger: using HDFS to communicate is too much of a kludge.

> Failed log copy is not restarted
> --------------------------------
>                 Key: ACCUMULO-449
>                 URL:
>             Project: Accumulo
>          Issue Type: Bug
>          Components: logger, master
>            Reporter: Keith Turner
>            Assignee: Eric Newton
>              Labels: 14_qa_bug
>             Fix For: 1.4.0
> I shut a single node instance down uncleanly.  When I restarted it the logger did not
have enough memory to preform the log sort, it got an OOME and died.  I edited
and gave the logger process more memory.  I restarted the logger process.  However, the log
recovery never restarted.   
> The master was continually printing message like the following.
> {noformat}
> 06 17:07:16,609 [master.CoordinateRecoveryTask] DEBUG: Copying 65c48045-88c1-48e4-93d3-4865a9a86050
from (for 1210.306000 seconds) 0.0
> {noformat}
> After 20m I restarted the master and then log recovery proceeded.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message