accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4090) BatchWriter close not cleaning up all resources
Date Wed, 23 Dec 2015 18:11:46 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069981#comment-15069981
] 

Eric Newton commented on ACCUMULO-4090:
---------------------------------------

I wrote a throw-away test to try and simulate the behavior.

Modified TabletServer to close any update sessions in flush.

I instrumented TabletServerBatchWriter to keep count of the number of instances of TSBWs.

I wrote a client tester that wrote data, let it flush with latency (because that's what the
production client is doing). Then I wrote some more mutations and closed it.  As expected,
I get the MutationsRejectedException.  Then I close the batchWriter and I get another MutationsRejectedException.


I repeat the process 10 times, ran the GC, waited, ran the GC again, and verified that all
the TSBWs were gone.



> BatchWriter close not cleaning up all resources
> -----------------------------------------------
>
>                 Key: ACCUMULO-4090
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4090
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 1.7.0
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>
> I'm debugging an issue with a long-running ingestor, similar to the TraceServer.
> After realizing that BatchWriter close needs to be called when a MutationsRejectedException
occurs (see ACCUMULO-4088), a close was added, and the client became more stable.
> However, after a day, or so, the client became sluggish. When inspecting a heap dump,
many TabletServerBatchWriter objects were still referenced.  This server should only have
two BatchWriter instances at any one time, and this server had >100.
> Still debugging.
> The error that initiates the issue is a SessionID not found, presumably because the session
timed out.  This is the cause of the MutationsRejectedException seen by the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message