hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sam liu <samliuhad...@gmail.com>
Subject Re: Questions on rollback/upgrade HDFS with QJM HA enabled
Date Mon, 26 Jan 2015 02:26:11 GMT
Could any expert please help answer the questions?

Thanks in advance!

2015-01-24 21:31 GMT+08:00 sam liu <samliuhadoop@gmail.com>:

> Hi Experts,
> I have questions on rollback/upgrade HDFS with QJM HA enabled.
> On the website
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#HDFS_UpgradeFinalizationRollback_with_HA_Enabled,
> it says:
> 'To perform a rollback of an upgrade, both NNs should first be shut down.
> The operator should run the roll back command on the NN where they
> initiated the upgrade procedure, which will perform the rollback on the
> local dirs there, as well as on the shared log, either NFS or on the JNs.
> Afterward, this NN should be started and the operator should run
> `-bootstrapStandby' on the other NN to bring the two NNs in sync with this
> rolled-back file system state.'
> Currently I expect the steps are(Please correct me if I am wrong):
> NN1 -> hadoop namenode -rollback
> NN1 -> hadoop namenode // In our env, this rollbacked namenode shuts down
> right after it finishes -rollback so it needs to be started again.
> NN2 -> hadoop namenode -bootstrapStandby
> hadoop datanode -rollback // on all datanodes
> [Question 1]:
> One thing I don't know is when the JournalNodes should be started and/or
> stopped. It seems like they should be started for the hadoop namenode
> -rollback. Should they be restarted sometime?
> [Question 2]:
> Another issue actually happens after the upgrade and before rollback
> starts: The standby NN process is actually heavily occupying the CPU and
> somehow is eating up disk space (without the disk space actually being
> used). This was causing "No space left on device" errors during the
> rollback process.  As soon as I killed the namenode process, the disk space
> was immediately back to a reasonable amount.
> What might cause the NN process to occupy in a hidden way so much disk
> space?
> Thanks!

View raw message