hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rakesh Radhakrishnan <rake...@apache.org>
Subject Re: Running a HA HDFS cluster on alpine linux
Date Mon, 29 Aug 2016 03:07:23 GMT
Hi Francis,

There could be cases of connection fluctuations between ZKFC and ZK server,
I've observed the following message from your logs. I'd suggest you to
start analyzing all your ZooKeeper servers log messages and see ZooKeeper
cluster status during this period. BTW, could you tell me the ZK cluster
size.

16/08/25 22:41:40 INFO zookeeper.ClientCnxn: Unable to read additional data
from server sessionid 0x156c3dc679a0003, likely server has closed socket,
closing socket connection and attempting reconnect
16/08/25 22:41:40 INFO ha.ActiveStandbyElector: Session disconnected.
Entering neutral mode...

Regards,
Rakesh

On Sat, Aug 27, 2016 at 5:16 PM, F21 <f21.groups@gmail.com> wrote:

> Hi all,
>
> I am currently experimenting with running a HA HDFS cluster in docker
> containers. I have successfully created an HA cluster using Ubuntu as my
> base image for running the namenode, datanode and journalnodes. The
> zookeeper instance runs on an image built using Alpine linux as the base
> and works pretty well.
>
> I attempted to get the namenode, datanode and journalnodes running using
> Alpine linux as the base image. The datanode and journalnodes seem to work
> fine. However, while the namenodes start correctly, they seem to disconnect
> from Zookeeper quite often and will transition into neutral mode. This
> results in the namenodes being in "startup mode" forever.
>
> These are the logs from the active namenode:
> 16/08/25 22:40:28 INFO blockmanagement.CacheReplicationMonitor: Starting
> CacheReplicationMonitor with interval 30000 milliseconds
> 16/08/25 22:40:28 INFO fs.TrashPolicyDefault: Namenode trash
> configuration: Deletion interval = 1440 minutes, Emptier interval = 0
> minutes.
> 16/08/25 22:40:28 INFO fs.TrashPolicyDefault: The configured checkpoint
> interval is 0 minutes. Using an interval of 1440 minutes that is used for
> deletion instead
> 16/08/25 22:40:28 INFO blockmanagement.BlockManager: Total number of
> blocks            = 0
> 16/08/25 22:40:28 INFO blockmanagement.BlockManager: Number of invalid
> blocks          = 0
> 16/08/25 22:40:28 INFO blockmanagement.BlockManager: Number of
> under-replicated blocks = 0
> 16/08/25 22:40:28 INFO blockmanagement.BlockManager: Number of
> over-replicated blocks = 0
> 16/08/25 22:40:28 INFO blockmanagement.BlockManager: Number of blocks
> being written    = 0
> 16/08/25 22:40:28 INFO hdfs.StateChange: STATE* Replication Queue
> initialization scan for invalid, over- and under-replicated blocks
> completed in 687 msec
> 16/08/25 22:40:28 INFO ha.ZKFailoverController: Successfully transitioned
> NameNode at m9edd51-nn1.m9edd51/172.18.0.7:8020 to active state
> 16/08/25 22:40:30 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> mkdir: `/tmp': File exists
> 16/08/25 22:40:32 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command
> getfileinfo is: 0
> 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command
> mkdirs is: 0
> 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command
> listStatus is: 0
> 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command
> * is: 0
> 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command
> setPermission is: 0
> 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command
> getfileinfo is: 1
> 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command
> mkdirs is: 1
> 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command
> listStatus is: 1
> 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command
> * is: 1
> 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command
> setPermission is: 1
> 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command
> getfileinfo is: 1
> 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command
> mkdirs is: 1
> 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command
> listStatus is: 1
> 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command
> * is: 1
> 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command
> setPermission is: 1
> 16/08/25 22:41:40 INFO zookeeper.ClientCnxn: Unable to read additional
> data from server sessionid 0x156c3dc679a0003, likely server has closed
> socket, closing socket connection and attempting reconnect
> 16/08/25 22:41:40 INFO ha.ActiveStandbyElector: Session disconnected.
> Entering neutral mode...
> 16/08/25 22:41:41 INFO zookeeper.ClientCnxn: Opening socket connection to
> server m9edd51-zookeeper.m9edd51/172.18.0.2:2181. Will not attempt to
> authenticate using SASL (unknown error)
> 16/08/25 22:41:41 INFO zookeeper.ClientCnxn: Socket connection established
> to m9edd51-zookeeper.m9edd51/172.18.0.2:2181, initiating session
> 16/08/25 22:41:41 INFO zookeeper.ClientCnxn: Session establishment
> complete on server m9edd51-zookeeper.m9edd51/172.18.0.2:2181, sessionid =
> 0x156c3dc679a0003, negotiated timeout = 5000
> 16/08/25 22:41:41 INFO ha.ActiveStandbyElector: Session connected.
> 16/08/25 22:41:41 INFO ha.ActiveStandbyElector: Checking for any old
> active which needs to be fenced...
> 16/08/25 22:41:41 INFO ha.ActiveStandbyElector: Old node exists:
> 0a036e733112036e6e311a136d3965646435312d6e6e312e6d39656464353120d43e28d33e
> 16/08/25 22:41:41 INFO ha.ActiveStandbyElector: But old node has our own
> data, so don't need to fence it.
> 16/08/25 22:41:41 INFO ha.ActiveStandbyElector: Writing znode
> /mycluster/ns1/ActiveBreadCrumb to indicate that the local node is the
> most recent active...
> 16/08/25 22:41:41 INFO ha.ZKFailoverController: Trying to make NameNode at
> m9edd51-nn1.m9edd51/172.18.0.7:8020 active...
> 16/08/25 22:41:41 INFO ha.ZKFailoverController: Successfully transitioned
> NameNode at m9edd51-nn1.m9edd51/172.18.0.7:8020 to active state
> 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command
> getfileinfo is: 0
> 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command
> mkdirs is: 0
> 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command
> listStatus is: 0
> 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command
> * is: 0
> 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command
> setPermission is: 0
> 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command
> getfileinfo is: 1
> 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command
> mkdirs is: 1
> 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command
> listStatus is: 1
> 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command
> * is: 1
> 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command
> setPermission is: 1
> 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command
> getfileinfo is: 1
> 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command
> mkdirs is: 1
> 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command
> listStatus is: 1
> 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command
> * is: 1
> 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command
> setPermission is: 1
> 16/08/25 22:42:28 INFO namenode.FSNamesystem: Roll Edit Log from 172.18.0.8
> 16/08/25 22:42:28 INFO namenode.FSEditLog: Rolling edit logs
> 16/08/25 22:42:28 INFO namenode.FSEditLog: Ending log segment 1
> 16/08/25 22:42:28 INFO namenode.FSEditLog: Number of transactions: 4 Total
> time for transactions(ms): 16 Number of transactions batched in Syncs: 0
> Number of syncs: 3 SyncTimes(ms): 40 17
> 16/08/25 22:42:28 INFO namenode.FSEditLog: Number of transactions: 4 Total
> time for transactions(ms): 16 Number of transactions batched in Syncs: 0
> Number of syncs: 4 SyncTimes(ms): 50 18
> 16/08/25 22:42:28 INFO namenode.FileJournalManager: Finalizing edits file
> /var/lib/hadoop/name/current/edits_inprogress_0000000000000000001 ->
> /var/lib/hadoop/name/current/edits_0000000000000000001-0000000000000000004
> 16/08/25 22:42:28 INFO namenode.FSEditLog: Starting log segment at 5
>
> From the standby namenode:
> 16/08/25 22:40:33 INFO ha.HealthMonitor: Entering state SERVICE_HEALTHY
> 16/08/25 22:40:33 INFO ha.ZKFailoverController: Local service NameNode at
> m9edd51-nn2.m9edd51/172.18.0.8:8020 entered state: SERVICE_HEALTHY
> 16/08/25 22:40:33 INFO ha.ZKFailoverController: ZK Election indicated that
> NameNode at m9edd51-nn2.m9edd51/172.18.0.8:8020 should become standby
> 16/08/25 22:40:33 INFO ha.ZKFailoverController: Successfully transitioned
> NameNode at m9edd51-nn2.m9edd51/172.18.0.8:8020 to standby state
> 16/08/25 22:41:40 INFO zookeeper.ClientCnxn: Unable to read additional
> data from server sessionid 0x156c3dc679a0004, likely server has closed
> socket, closing socket connection and attempting reconnect
> 16/08/25 22:41:40 INFO ha.ActiveStandbyElector: Session disconnected.
> Entering neutral mode...
> 16/08/25 22:41:41 INFO zookeeper.ClientCnxn: Opening socket connection to
> server m9edd51-zookeeper.m9edd51/172.18.0.2:2181. Will not attempt to
> authenticate using SASL (unknown error)
> 16/08/25 22:41:41 INFO zookeeper.ClientCnxn: Socket connection established
> to m9edd51-zookeeper.m9edd51/172.18.0.2:2181, initiating session
> 16/08/25 22:41:41 INFO zookeeper.ClientCnxn: Session establishment
> complete on server m9edd51-zookeeper.m9edd51/172.18.0.2:2181, sessionid =
> 0x156c3dc679a0004, negotiated timeout = 5000
> 16/08/25 22:41:41 INFO ha.ActiveStandbyElector: Session connected.
> 16/08/25 22:41:41 INFO ha.ZKFailoverController: ZK Election indicated that
> NameNode at m9edd51-nn2.m9edd51/172.18.0.8:8020 should become standby
> 16/08/25 22:41:41 INFO ha.ZKFailoverController: Successfully transitioned
> NameNode at m9edd51-nn2.m9edd51/172.18.0.8:8020 to standby state
> 16/08/25 22:42:28 INFO ha.EditLogTailer: Triggering log roll on remote
> NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020
> 16/08/25 22:42:28 INFO namenode.FSImage: Reading
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInput
> Stream@3b1f16f0 expecting start txid #1
> 16/08/25 22:42:28 INFO namenode.FSImage: Start loading edits file
> http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=1&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=1&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=1&storageInfo=-63%3A2072890455%3A0%3Amycluster
> 16/08/25 22:42:28 INFO namenode.EditLogInputStream: Fast-forwarding stream
> 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentT
> xId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=1&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=1&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 1
> 16/08/25 22:42:28 INFO namenode.EditLogInputStream: Fast-forwarding stream
> 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentT
> xId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 1
> 16/08/25 22:42:29 INFO namenode.FSImage: Edits file
> http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=1&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=1&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=1&storageInfo=-63%3A2072890455%3A0%3Amycluster of size 133 edits # 4
> loaded in 0 seconds
> 16/08/25 22:42:29 INFO ha.EditLogTailer: Loaded 4 edits starting from txid
> 0
> 16/08/25 22:44:29 INFO ha.EditLogTailer: Triggering log roll on remote
> NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020
> 16/08/25 22:44:29 INFO namenode.FSImage: Reading
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@7c92162
> expecting start txid #5
> 16/08/25 22:44:29 INFO namenode.FSImage: Start loading edits file
> http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=5&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=5&storageInfo=-63%3A2072890455%3A0%3Amycluster
> 16/08/25 22:44:29 INFO namenode.EditLogInputStream: Fast-forwarding stream
> 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentT
> xId=5&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=5&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 5
> 16/08/25 22:44:29 INFO namenode.EditLogInputStream: Fast-forwarding stream
> 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentT
> xId=5&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 5
> 16/08/25 22:44:29 INFO namenode.FSImage: Edits file
> http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=5&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=5&storageInfo=-63%3A2072890455%3A0%3Amycluster of size 42 edits # 2
> loaded in 0 seconds
> 16/08/25 22:44:29 INFO ha.EditLogTailer: Loaded 2 edits starting from txid
> 4
> 16/08/25 22:46:29 INFO ha.EditLogTailer: Triggering log roll on remote
> NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020
> 16/08/25 22:46:29 INFO namenode.FSImage: Reading
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInput
> Stream@1f890896 expecting start txid #7
> 16/08/25 22:46:29 INFO namenode.FSImage: Start loading edits file
> http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=7&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=7&storageInfo=-63%3A2072890455%3A0%3Amycluster
> 16/08/25 22:46:29 INFO namenode.EditLogInputStream: Fast-forwarding stream
> 'http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentT
> xId=7&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=7&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 7
> 16/08/25 22:46:29 INFO namenode.EditLogInputStream: Fast-forwarding stream
> 'http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentT
> xId=7&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 7
> 16/08/25 22:46:29 INFO namenode.FSImage: Edits file
> http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=7&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=7&storageInfo=-63%3A2072890455%3A0%3Amycluster of size 42 edits # 2
> loaded in 0 seconds
> 16/08/25 22:46:29 INFO ha.EditLogTailer: Loaded 2 edits starting from txid
> 6
> 16/08/25 22:48:29 INFO ha.EditLogTailer: Triggering log roll on remote
> NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020
> 16/08/25 22:48:29 INFO namenode.FSImage: Reading
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInput
> Stream@4da07bde expecting start txid #9
> 16/08/25 22:48:29 INFO namenode.FSImage: Start loading edits file
> http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=9&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=9&storageInfo=-63%3A2072890455%3A0%3Amycluster
> 16/08/25 22:48:29 INFO namenode.EditLogInputStream: Fast-forwarding stream
> 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentT
> xId=9&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=9&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 9
> 16/08/25 22:48:29 INFO namenode.EditLogInputStream: Fast-forwarding stream
> 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentT
> xId=9&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 9
> 16/08/25 22:48:30 INFO namenode.FSImage: Edits file
> http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=9&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=9&storageInfo=-63%3A2072890455%3A0%3Amycluster of size 42 edits # 2
> loaded in 0 seconds
> 16/08/25 22:48:30 INFO ha.EditLogTailer: Loaded 2 edits starting from txid
> 8
> 16/08/25 22:50:30 INFO ha.EditLogTailer: Triggering log roll on remote
> NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020
> 16/08/25 22:50:30 INFO namenode.FSImage: Reading
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInput
> Stream@350cea58 expecting start txid #11
> 16/08/25 22:50:30 INFO namenode.FSImage: Start loading edits file
> http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=11&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=11&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=11&storageInfo=-63%3A2072890455%3A0%3Amycluster
> 16/08/25 22:50:30 INFO namenode.EditLogInputStream: Fast-forwarding stream
> 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentT
> xId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=11&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=11&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 11
> 16/08/25 22:50:30 INFO namenode.EditLogInputStream: Fast-forwarding stream
> 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentT
> xId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 11
> 16/08/25 22:50:30 INFO namenode.FSImage: Edits file
> http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=11&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=11&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=11&storageInfo=-63%3A2072890455%3A0%3Amycluster of size 42 edits # 2
> loaded in 0 seconds
> 16/08/25 22:50:30 INFO ha.EditLogTailer: Loaded 2 edits starting from txid
> 10
> 16/08/25 22:52:30 INFO ha.EditLogTailer: Triggering log roll on remote
> NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020
> 16/08/25 22:52:30 INFO namenode.FSImage: Reading
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInput
> Stream@6a6c8f51 expecting start txid #13
> 16/08/25 22:52:30 INFO namenode.FSImage: Start loading edits file
> http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=13&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=13&storageInfo=-63%3A2072890455%3A0%3Amycluster
> 16/08/25 22:52:30 INFO namenode.EditLogInputStream: Fast-forwarding stream
> 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentT
> xId=13&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=13&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 13
> 16/08/25 22:52:30 INFO namenode.EditLogInputStream: Fast-forwarding stream
> 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentT
> xId=13&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 13
> 16/08/25 22:52:30 INFO namenode.FSImage: Edits file
> http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=13&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=13&storageInfo=-63%3A2072890455%3A0%3Amycluster of size 42 edits # 2
> loaded in 0 seconds 16/08/25 22:52:30 INFO ha.EditLogTailer: Loaded 2 edits
> starting from txid 12
> 16/08/25 22:54:30 INFO ha.EditLogTailer: Triggering log roll on remote
> NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020
> 16/08/25 22:54:30 INFO namenode.FSImage: Reading
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@8212238
> expecting start txid #15
> 16/08/25 22:54:30 INFO namenode.FSImage: Start loading edits file
> http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=15&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=15&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=15&storageInfo=-63%3A2072890455%3A0%3Amycluster
> 16/08/25 22:54:30 INFO namenode.EditLogInputStream: Fast-forwarding stream
> 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentT
> xId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=15&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=15&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 15
> 16/08/25 22:54:30 INFO namenode.EditLogInputStream: Fast-forwarding stream
> 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentT
> xId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 15
> 16/08/25 22:54:30 INFO namenode.FSImage: Edits file
> http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=15&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=15&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=15&storageInfo=-63%3A2072890455%3A0%3Amycluster of size 42 edits # 2
> loaded in 0 seconds
> 16/08/25 22:54:30 INFO ha.EditLogTailer: Loaded 2 edits starting from txid
> 14
> 16/08/25 22:56:30 INFO ha.EditLogTailer: Triggering log roll on remote
> NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020
> 16/08/25 22:56:30 INFO namenode.FSImage: Reading
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInput
> Stream@58d1e7d8 expecting start txid #17
> 16/08/25 22:56:30 INFO namenode.FSImage: Start loading edits file
> http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=17&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=17&storageInfo=-63%3A2072890455%3A0%3Amycluster
> 16/08/25 22:56:30 INFO namenode.EditLogInputStream: Fast-forwarding stream
> 'http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentT
> xId=17&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=17&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 17
> 16/08/25 22:56:30 INFO namenode.EditLogInputStream: Fast-forwarding stream
> 'http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentT
> xId=17&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 17
> 16/08/25 22:56:30 INFO namenode.FSImage: Edits file
> http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=17&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=17&storageInfo=-63%3A2072890455%3A0%3Amycluster of size 42 edits # 2
> loaded in 0 seconds 16/08/25 22:56:30 INFO ha.EditLogTailer: Loaded 2 edits
> starting from txid 16
> 16/08/25 22:58:30 INFO ha.EditLogTailer: Triggering log roll on remote
> NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020
> 16/08/25 22:58:30 INFO namenode.FSImage: Reading
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInput
> Stream@791c80ff expecting start txid #19
> 16/08/25 22:58:30 INFO namenode.FSImage: Start loading edits file
> http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=19&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=19&storageInfo=-63%3A2072890455%3A0%3Amycluster
> 16/08/25 22:58:30 INFO namenode.EditLogInputStream: Fast-forwarding stream
> 'http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentT
> xId=19&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=19&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 19
> 16/08/25 22:58:30 INFO namenode.EditLogInputStream: Fast-forwarding stream
> 'http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentT
> xId=19&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 19
> 16/08/25 22:58:30 INFO namenode.FSImage: Edits file
> http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=19&storageInfo=-63%3A2072890455%3A0%3Amycluster,
> http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTx
> Id=19&storageInfo=-63%3A2072890455%3A0%3Amycluster of size 42 edits # 2
> loaded in 0 seconds 16/08/25 22:58:30 INFO ha.EditLogTailer: Loaded 2 edits
> starting from txid 18
>
> Any ideas where I can start to debug this? This seems to be a problem
> specific to the zookeeper failover controller. I tried this docker image
> that starts a standalone cluster with 2 namenodes, datanodes and journal
> nodes, but no zookeeper failover controller and it works correctly on
> Alpline linux: https://github.com/gustavonalle/yarn-docker
>
> Cheers,
> Francis
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: user-help@hadoop.apache.org
>
>

Mime
View raw message