accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-1708) Error during minor compaction left tserver in bad state
Date Wed, 08 Jan 2014 15:15:51 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13865531#comment-13865531
] 

Keith Turner commented on ACCUMULO-1708:
----------------------------------------

bq. in bin/accumulo and the ability for users to define their own ACCUMULO_KILL_CMD in accumulo-env.sh,
is this even as big of a worry as previously?

The out of memory kill flag was present, unfortunately this did not kick in.  See the last
sentence in the ticket description.  I think that flag may only kick in for heap allocation
errors.  However other java code throws OOME.  For example when java can not create a new
thread it will throw OOME, but this does not seem to trigger {{-XX:OnOutOfMemoryError}}.

I think the best course of action is to modify Accumulo code to catch Error and halt for the
threads it creates (maybe use thread groups to do this).   For the threads created by zookeeper
and HDFS, create follow on accumulo, zookeeper, and hadoop tickets as needed.  I am not sure
if I will have time to do this for 1.6, I may push it to 1.7.

> Error during minor compaction left tserver in bad state
> -------------------------------------------------------
>
>                 Key: ACCUMULO-1708
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1708
>             Project: Accumulo
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>            Reporter: Keith Turner
>            Priority: Critical
>             Fix For: 1.6.0
>
>         Attachments: ThreadTest.java
>
>
> A tserver experienced a OOME during minor compaction.  This OOME was thrown because java
could not create a native thread.  Minor compactions only catch declared exceptions and RuntimeExceptions.
 This left the system in a state where the compaction was not running but the tserver thought
it was.  This cause"flush -w" to hang and prevented the tserver from reclaiming memory.
> For whatever reason the OOME handler that kills the process did not kick in (seems it
only kicks in w/ OOME related to heap allocation).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message