accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Marion (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4090) BatchWriter close not cleaning up all resources
Date Wed, 23 Dec 2015 16:01:46 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069780#comment-15069780
] 

Dave Marion commented on ACCUMULO-4090:
---------------------------------------

Looking at a heap dump I consistently see two objects in the queue for the jtimer object,
a FailedMutations object and an anonymous timer task. I believe the following should be done:

 1. When TSBW.close() is called, then FailedMutations.cancel() should be called.
 2. A reference should be kept to the TimerTask added to jtimer in the TSBW constructor. Then
in TSBW.close() the cancel() method should be called on this task.

Looking at the TabletServerBatchWriter objects in the heap dump I see that the closed field
is always false. I wonder if the root cause is that this field is not marked as volatile (and
the flushing field may be an issue too).

> BatchWriter close not cleaning up all resources
> -----------------------------------------------
>
>                 Key: ACCUMULO-4090
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4090
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 1.7.0
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>
> I'm debugging an issue with a long-running ingestor, similar to the TraceServer.
> After realizing that BatchWriter close needs to be called when a MutationsRejectedException
occurs (see ACCUMULO-4088), a close was added, and the client became more stable.
> However, after a day, or so, the client became sluggish. When inspecting a heap dump,
many TabletServerBatchWriter objects were still referenced.  This server should only have
two BatchWriter instances at any one time, and this server had >100.
> Still debugging.
> The error that initiates the issue is a SessionID not found, presumably because the session
timed out.  This is the cause of the MutationsRejectedException seen by the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message