hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4114) Remove the CheckpointNode
Date Fri, 02 Nov 2012 18:23:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489627#comment-13489627

Todd Lipcon commented on HDFS-4114:

I see BackupNode as a better way of creating checkpoints. SNN uploads the image and the edits
from NN, merges them in memory and then sends back the new checkpoint.
BN needs only to saveNamespace() from memory and then sends back the new image. This reduces
the network traffic and local disk IOs on the upload of two large files. I have seen on multiple
large clusters NameNode running much slower, when the checkpoint is in progress.
It is beneficial for HDFS performance to switch from SNN to BN for checkpointing. Therefore
I would advocate re-re-deprecating SNN instead of removing BN.

This argument seems to be predicated on an idea that the SecondaryNameNode doesn't keep the
image in memory between checkpoints, and that it downloads the image from the NN anew for
each checkpoint. This hasn't been the case since HDFS-1458 in 0.23, which made a small improvement
to the 2NN to solve the problem you're pointing out.

I would be glad to go into design discussion and potential enhancements of BackupNode with
you. Would appreciate it given your experience with HA, as I believe the HA story for Hadoop
isn't over with the implementation of Quorum Journal.

Feel free to ping me if you have any questions on the HA design or implementation - always
happy to help.

Although this issue is not about it. Sticking to the point, what are your arguments for removing
(or better say deprecating) BN besides that it has bugs? Software tends to have bugs. E.g.
you do not propose to remove BlockScanner just because it couldn't been fixed over a series

The BackupNode doesn't provide any feature that is not provided better by other pieces of
code. Your argument about efficiency isn't valid given HDFS-1458.

The BlockScanner argument is a silly one: it has had some bugs, but there is no alternative
available which _doesn't_ have bugs, so a buggy piece of code is better than no piece of code.
If someone had written a new BlockScanner which offered more features and fewer bugs, I'd
absolutely advocate removing it.
> Remove the CheckpointNode
> -------------------------
>                 Key: HDFS-4114
>                 URL: https://issues.apache.org/jira/browse/HDFS-4114
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Eli Collins
>            Assignee: Eli Collins
> Per the thread on hdfs-dev@ (http://s.apache.org/tMT) let's remove the BackupNode and

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message