hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinayakumar B (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-7645) Rolling upgrade is restoring blocks from trash multiple times
Date Tue, 17 Mar 2015 10:26:41 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364882#comment-14364882
] 

Vinayakumar B edited comment on HDFS-7645 at 3/17/15 10:26 AM:
---------------------------------------------------------------

bq. DNs look for RollingUpgradeStatus in the heartbeat response. If it is absent then DNs
infer that the rolling upgrade is finalized. If the administrator attempts to do a rollback
without stopping all DNs first then clearing trash will cause data loss. 
Even though administrator does it by mistake, it will be a irrecoverable data loss. 
Just to avoid this, How about having the finalized {{RollingUpgradeStatus}} in the NameNode
once the upgrade is finalized instead of making it null.?
And in DNs we can check specifically check for the FINALIZED status before clearing the trash.

Any thoughts ?


was (Author: vinayrpet):
bq. DNs look for RollingUpgradeStatus in the heartbeat response. If it is absent then DNs
infer that the rolling upgrade is finalized. If the administrator attempts to do a rollback
without stopping all DNs first then clearing trash will cause data loss. 
Even though administrator does it by mistake, it will be a irrecoverable data loss. 
Just to avoid this, How about having the finalized {{RollingUpgradeStatus }} in the NameNode
once the upgrade is finalized instead of making it null.?
And in DNs we can check specifically check for the FINALIZED status before clearing the trash.

Any thoughts ?

> Rolling upgrade is restoring blocks from trash multiple times
> -------------------------------------------------------------
>
>                 Key: HDFS-7645
>                 URL: https://issues.apache.org/jira/browse/HDFS-7645
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.6.0
>            Reporter: Nathan Roberts
>            Assignee: Keisuke Ogiwara
>         Attachments: HDFS-7645.01.patch, HDFS-7645.02.patch, HDFS-7645.03.patch, HDFS-7645.04.patch
>
>
> When performing an HDFS rolling upgrade, the trash directory is getting restored twice
when under normal circumstances it shouldn't need to be restored at all. iiuc, the only time
these blocks should be restored is if we need to rollback a rolling upgrade. 
> On a busy cluster, this can cause significant and unnecessary block churn both on the
datanodes, and more importantly in the namenode.
> The two times this happens are:
> 1) restart of DN onto new software
> {code}
>   private void doTransition(DataNode datanode, StorageDirectory sd,
>       NamespaceInfo nsInfo, StartupOption startOpt) throws IOException {
>     if (startOpt == StartupOption.ROLLBACK && sd.getPreviousDir().exists()) {
>       Preconditions.checkState(!getTrashRootDir(sd).exists(),
>           sd.getPreviousDir() + " and " + getTrashRootDir(sd) + " should not " +
>           " both be present.");
>       doRollback(sd, nsInfo); // rollback if applicable
>     } else {
>       // Restore all the files in the trash. The restored files are retained
>       // during rolling upgrade rollback. They are deleted during rolling
>       // upgrade downgrade.
>       int restored = restoreBlockFilesFromTrash(getTrashRootDir(sd));
>       LOG.info("Restored " + restored + " block files from trash.");
>     }
> {code}
> 2) When heartbeat response no longer indicates a rollingupgrade is in progress
> {code}
>   /**
>    * Signal the current rolling upgrade status as indicated by the NN.
>    * @param inProgress true if a rolling upgrade is in progress
>    */
>   void signalRollingUpgrade(boolean inProgress) throws IOException {
>     String bpid = getBlockPoolId();
>     if (inProgress) {
>       dn.getFSDataset().enableTrash(bpid);
>       dn.getFSDataset().setRollingUpgradeMarker(bpid);
>     } else {
>       dn.getFSDataset().restoreTrash(bpid);
>       dn.getFSDataset().clearRollingUpgradeMarker(bpid);
>     }
>   }
> {code}
> HDFS-6800 and HDFS-6981 were modifying this behavior making it not completely clear whether
this is somehow intentional. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message