hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Weiwei Yang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-12361) Ozone: SCM failed to start when a container metadata is empty
Date Mon, 28 Aug 2017 03:56:00 GMT
Weiwei Yang created HDFS-12361:
----------------------------------

             Summary: Ozone: SCM failed to start when a container metadata is empty
                 Key: HDFS-12361
                 URL: https://issues.apache.org/jira/browse/HDFS-12361
             Project: Hadoop HDFS
          Issue Type: Sub-task
          Components: ozone, scm
    Affects Versions: HDFS-7240
            Reporter: Weiwei Yang
            Assignee: Weiwei Yang


When I run tests to create keys via corona, sometimes it left some containers with empty metadata.
This might also happen when SCM stopped at some point that metadata was not yet written. When
this happens, we got following error and SCM could not be started

{noformat}
17/08/27 20:10:57 WARN datanode.DataNode: Unexpected exception in block pool Block pool BP-821804790-172.16.165.133-1503887277256
(Datanode Uuid 7ee16a59-9604-406e-a0f8-6f44650a725b) service to ozone1.fyre.ibm.com/172.16.165.133:8111
java.lang.NullPointerException
	at org.apache.hadoop.ozone.container.common.helpers.ContainerData.getFromProtBuf(ContainerData.java:66)
	at org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.readContainerInfo(ContainerManagerImpl.java:210)
	at org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.init(ContainerManagerImpl.java:158)
	at org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.<init>(OzoneContainer.java:99)
	at org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.<init>(DatanodeStateMachine.java:77)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.bpRegistrationSucceeded(DataNode.java:1592)
	at org.apache.hadoop.hdfs.server.datanode.BPOfferService.registrationSucceeded(BPOfferService.java:409)
	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:783)
	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:286)
	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
	at java.lang.Thread.run(Thread.java:745)
{noformat}

We should add a NPE check and mark such containers as inactive without failing the SCM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org


Mime
View raw message