hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5060) NN should proactively perform a saveNamespace if it has a huge number of outstanding uncheckpointed transactions
Date Mon, 05 Aug 2013 17:36:49 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13729693#comment-13729693
] 

Aaron T. Myers commented on HDFS-5060:
--------------------------------------

bq. Adding to what Kihwal said, this should be turned off by default.

I'm not so sure about that. What if it were set to a very high threshold by default, say 50
GB of outstanding edit logs? Your concern seems to be about not forcing operators who keep
a close on their system from being affected, but in a properly-functioning, well-monitored
system, this threshold should never be hit, and thus having the feature off entirely seems
like overkill to me. If you get to the point where you have tens or hundreds of gigabytes
of edit logs outstanding, you're likely looking at a multi-hour restart if things go down.

bq. I think disrupting a running service is a big problem with the proposed approach. How
often have you seen this issue that warrants a change like this?

Just a handful of times, but when this issue occurs the event is so severe that I think it
warrants doing something to address it. We shouldn't let people shoot themselves in the foot.

bq. Why cannot bringing up a secondary/standby be a solution?

Of course that's a solution, and the proper long-term solution to this problem. This is certainly
not meant to replace that. The issue is that I've observed several times folks allowing the
checkpointing node to be down for an inordinately long time, and even with a monitoring solution
in place that's alerting them to this issue, folks don't fully understand the ramifications
of stale checkpoints and a large number of outstanding uncheckpointed edits.

bq. The issue that I have seen (quite infrequently though) is, secondary not being able to
checkpoint due to editlog corruption. I created HDFS-4923 for this; if an operator forgets
to manually save the namespace, during shutdown time the system could save the namespace automatically.
This solves several issues mentioned in the jira.

That's a fine idea as well, but obviously won't help in the event that the shutdown isn't
clean, so it won't completely alleviate this issue.
                
> NN should proactively perform a saveNamespace if it has a huge number of outstanding
uncheckpointed transactions
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-5060
>                 URL: https://issues.apache.org/jira/browse/HDFS-5060
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.1.0-beta
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>
> In a properly-functioning HDFS system, checkpoints will be triggered either by the secondary
NN or standby NN regularly, by default every hour or 1MM outstanding edits transactions, whichever
come first. However, in cases where this second node is down for an extended period of time,
the number of outstanding transactions can grow so large as to cause a restart to take an
inordinately long time.
> This JIRA proposes to make the active NN monitor its number of outstanding transactions
and perform a proactive local saveNamespace if it grows beyond a configurable threshold. I'm
envisioning something like 10x the configured number of transactions which in a properly-functioning
cluster would result in a checkpoint from the second NN. Though this would be disruptive to
clients while it's taking place, likely for a few minutes, this seems better than the alternative
of a subsequent multi-hour restart and should never actually occur in a properly-functioning
cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message