accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Marion (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4004) open WALs prevent DN decommissioning
Date Wed, 30 Mar 2016 12:36:26 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15217898#comment-15217898
] 

Dave Marion commented on ACCUMULO-4004:
---------------------------------------

Basically decommissioning is broken right now in Hadoop 2.

WALogs stay open until they hit the size threshold, which could be many hours or days in some
cases. These open files will prevent a DN from finishing its decommissioning process[1]. If
you stop the DN, then the WALog file will not be closed and you could lose data. You have
to find the tservers that are writing to the WALog and stop them so that the WALog is closed.

There is also another nasty bug[2] where the NN gives clients old locations of blocks that
have been moved due to decommissioning. As you can imagine this can create all kinds of problems.
Then, there is [3] with all of its related issues.

With this patch, you can set the max age to the amount of time you are willing to wait for
a DN to decommission (if you choose to take the risk of hitting [2]).

[1] https://issues.apache.org/jira/browse/HDFS-3599
[2] https://issues.apache.org/jira/browse/HDFS-8208
[3] https://issues.apache.org/jira/browse/HDFS-8406

> open WALs prevent DN decommissioning
> ------------------------------------
>
>                 Key: ACCUMULO-4004
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4004
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: tserver
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>             Fix For: 1.6.6, 1.7.2, 1.8.0
>
>         Attachments: ACCUMULO-4004-1.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> It should be possible to manually roll WALs so that files on decommissioning datanodes
are closed and the decommissioning process can complete. At the very least, the logs could
be closed after an elapsed period of time, such as an hour.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message