hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian Fang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3743) QJM: improve formatting behavior for JNs
Date Mon, 11 Apr 2016 20:22:26 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235906#comment-15235906
] 

Jian Fang commented on HDFS-3743:
---------------------------------

I implemented the "-newEditsOnly" option and tested it on a cluster with 3 instances that
ran server side daemons such as name nodes, resource managers, journal nodes, and others.
I replaced two out of the three instances successfully with name node running fine by issuing
the command "hdfs namenode -initializeSharedEdits -newEditsOnly" before started a name node.
However, I ran into an issue when I replaced the last existing instance, i.e., replaced the
original m1,m2, and m3 with m4, m5, and m6 one at a time. The name node suddenly stuck in
loading fsimage.

The error message was as follows.

"
2016-04-04 06:51:34,612 INFO org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode (main):
Loading 1 INodes.
2016-04-04 06:51:34,670 INFO org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf
(main): Loaded FSImage in 0 seconds.
2016-04-04 06:51:34,670 INFO org.apache.hadoop.hdfs.server.namenode.FSImage (main): Loaded
image for txid 0 from /mnt/var/lib/hadoop/dfs-name/current/fsimage_0000000000000000000
2016-04-04 06:51:34,678 INFO org.apache.hadoop.hdfs.server.namenode.FSImage (main): Reading
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@3af6b48f expecting start
txid #1
2016-04-04 06:51:34,678 INFO org.apache.hadoop.hdfs.server.namenode.FSImage (main): Start
loading edits file http://ip-10-171-94-13.ec2.internal:8480/getJournal?jid=my-cluster&segmentTxId=19&storageInfo=-63%3A1708928638%3A0%3ACID-218c0d5a-8772-4a75-8ede-44f84d24f280
2016-04-04 06:51:34,683 INFO org.apache.hadoop.hdfs.server.namenode.EditLogInputStream (main):
Fast-forwarding stream 'http://ip-10-171-94-13.ec2.internal:8480/getJournal?jid=my-cluster&segmentTxId=19&storageInfo=-63%3A1708928638%3A0%3ACID-218c0d5a-8772-4a75-8ede-44f84d24f280'
to transaction ID 1
2016-04-04 06:51:34,683 INFO org.apache.hadoop.hdfs.server.namenode.EditLogInputStream (main):
Fast-forwarding stream 'http://ip-10-171-94-13.ec2.internal:8480/getJournal?jid=my-cluster&segmentTxId=19&storageInfo=-63%3A1708928638%3A0%3ACID-218c0d5a-8772-4a75-8ede-44f84d24f280'
to transaction ID 1
2016-04-04 06:51:34,845 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem (main): Encountered
exception loading fsimage
java.io.IOException: There appears to be a gap in the edit log. We expected txid 1, but got
txid 19.
at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:215)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:837)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:692)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:975)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:814)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:798)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1491)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1557)
"

Seems the fsimage was messed up during the replacement. Could someone shed some light on what
could possibly be wrong? 

In my "-newEditsOnly" logic, I skipped the following code snippet in the initializeSharedEdits()
method because the other name node could be still running and be an active one when the above
command was issued. I assumed that the new journal node could rely on the recovery protocol
in QJM to sync up with other peers for the latest transactions.

----------------------------------------------------------------------------------------------
        NNStorage newSharedStorage = sharedEditsImage.getStorage();

        // Call Storage.format instead of FSImage.format here, since we don't
        // actually want to save a checkpoint - just prime the dirs with
        // the existing namespace info
        newSharedStorage.format(nsInfo);
        sharedEditsImage.getEditLog().formatNonFileJournals(nsInfo);

        // Need to make sure the edit log segments are in good shape to initialize
        // the shared edits dir.
        fsns.getFSImage().getEditLog().close();
        fsns.getFSImage().getEditLog().initJournalsForWrite();
        fsns.getFSImage().getEditLog().recoverUnclosedStreams();

        copyEditLogSegmentsToSharedDir(fsns, sharedEditsDirs, newSharedStorage,
            conf);
--------------------------------------------------------------------------------------------
In the new formatted Journal node, I did observe the segments were recovered from its peers
successfully, for example.

2016-04-11 19:06:38,894 WARN org.apache.hadoop.hdfs.server.common.Storage (IPC Server handler
0 on 8485): Storage directory /mnt/var/lib/hadoop/journal/local/data/my-cluster does not exist
2016-04-11 19:06:39,170 INFO org.apache.hadoop.hdfs.qjournal.server.Journal (IPC Server handler
1 on 8485): Formatting org.apache.hadoop.hdfs.qjournal.server.Journal@21e17c6d with namespace
info: lv=-63;cid=CID-72db83e1-fe0c-418a-a861-d9525f73adc8;nsid=1427415229;c=0;bpid=BP-1127919701-10.236.146.62-1460399925301
2016-04-11 19:06:39,170 INFO org.apache.hadoop.hdfs.server.common.Storage (IPC Server handler
1 on 8485): Formatting journal Storage Directory /mnt/var/lib/hadoop/journal/local/data/my-cluster
with nsid: 1427415229
2016-04-11 19:06:39,216 INFO org.apache.hadoop.hdfs.server.common.Storage (IPC Server handler
1 on 8485): Lock on /mnt/var/lib/hadoop/journal/local/data/my-cluster/in_use.lock acquired
by nodename 2774@ip-10-225-14-41.ec2.internal
2016-04-11 19:06:49,227 INFO org.apache.hadoop.hdfs.qjournal.server.Journal (IPC Server handler
3 on 8485): Updating lastPromisedEpoch from 0 to 3 for client /10.139.37.248
2016-04-11 19:06:49,274 INFO org.apache.hadoop.hdfs.qjournal.server.Journal (IPC Server handler
3 on 8485): Scanning storage FileJournalManager(root=/mnt/var/lib/hadoop/journal/local/data/my-cluster)
2016-04-11 19:06:49,294 INFO org.apache.hadoop.hdfs.qjournal.server.Journal (IPC Server handler
3 on 8485): No files in FileJournalManager(root=/mnt/var/lib/hadoop/journal/local/data/my-cluster)
2016-04-11 19:06:49,356 INFO org.apache.hadoop.hdfs.qjournal.server.Journal (IPC Server handler
4 on 8485): Prepared recovery for segment 1667: lastWriterEpoch: 0
2016-04-11 19:06:49,545 INFO org.apache.hadoop.hdfs.qjournal.server.Journal (IPC Server handler
0 on 8485): Synchronizing log startTxId: 1667 endTxId: 3104 isInProgress: true: no current
segment in place
2016-04-11 19:06:49,553 INFO org.apache.hadoop.hdfs.qjournal.server.Journal (IPC Server handler
0 on 8485): Synchronizing log startTxId: 1667 endTxId: 3104 isInProgress: true from http://ip-10-139-37-248.ec2.internal:8480/getJournal?jid=my-cluster&segmentTxId=1667&storageInfo=-63%3A1427415229%3A0%3ACID-72db83e1-fe0c-418a-a861-d9525f73adc8
2016-04-11 19:06:50,336 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage (IPC Server
handler 0 on 8485): Image Transfer timeout configured to 60000 milliseconds
2016-04-11 19:06:50,616 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage (IPC Server
handler 0 on 8485): Transfer took 0.25s at 4079.68 KB/s
2016-04-11 19:06:50,632 INFO org.apache.hadoop.hdfs.qjournal.server.Journal (IPC Server handler
0 on 8485): Accepted recovery for segment 1667: segmentState { startTxId: 1667 endTxId: 3104
isInProgress: true } acceptedInEpoch: 3
2016-04-11 19:06:50,646 INFO org.apache.hadoop.hdfs.qjournal.server.Journal (IPC Server handler
2 on 8485): Validating log segment /mnt/var/lib/hadoop/journal/local/data/my-cluster/current/edits_inprogress_0000000000000001667
about to be finalized
2016-04-11 19:06:50,807 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager (IPC
Server handler 2 on 8485): Finalizing edits file /mnt/var/lib/hadoop/journal/local/data/my-cluster/current/edits_inprogress_0000000000000001667
-> /mnt/var/lib/hadoop/journal/local/data/my-cluster/current/edits_0000000000000001667-0000000000000003104
2016-04-11 19:06:50,845 INFO org.apache.hadoop.hdfs.qjournal.server.Journal (IPC Server handler
4 on 8485): Updating lastWriterEpoch from 0 to 3 for client /10.139.37.248

Did I miss anything that could mess up the fsimage files?

Thanks in advance.

> QJM: improve formatting behavior for JNs
> ----------------------------------------
>
>                 Key: HDFS-3743
>                 URL: https://issues.apache.org/jira/browse/HDFS-3743
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: QuorumJournalManager (HDFS-3077)
>            Reporter: Todd Lipcon
>
> Currently, the JournalNodes automatically format themselves when a new writer takes over,
if they don't have any data for that namespace. However, this has a few problems:
> 1) if the administrator accidentally points a new NN at the wrong quorum (eg corresponding
to another cluster), it will auto-format a directory on those nodes. This doesn't cause any
data loss, but would be better to bail out with an error indicating that they need to be formatted.
> 2) if a journal node crashes and needs to be reformatted, it should be able to re-join
the cluster and start storing new segments without having to fail over to a new NN.
> 3) if 2/3 JNs get accidentally reformatted (eg the mount point becomes undone), and the
user starts the NN, it should fail to start, because it may end up missing edits. If it auto-formats
in this case, the user might have silent "rollback" of the most recent edits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message