hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-12361) Ozone: SCM failed to start when a container metadata is empty
Date Tue, 24 Apr 2018 20:55:09 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450962#comment-16450962
] 

Hudson commented on HDFS-12361:
-------------------------------

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14057 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14057/])
HDFS-12361. Ozone: SCM failed to start when a container metadata is (aengineer: rev b06f4f63e350d3276da989843ab778d3b5679ae8)
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerManagerImpl.java


> Ozone: SCM failed to start when a container metadata is empty
> -------------------------------------------------------------
>
>                 Key: HDFS-12361
>                 URL: https://issues.apache.org/jira/browse/HDFS-12361
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone, scm
>    Affects Versions: HDFS-7240
>            Reporter: Weiwei Yang
>            Assignee: Weiwei Yang
>            Priority: Major
>         Attachments: HDFS-12361-HDFS-7240.001.patch
>
>
> When I run tests to create keys via corona, sometimes it left some containers with empty
metadata. This might also happen when SCM stopped at some point that metadata was not yet
written. When this happens, we got following error and SCM could not be started
> {noformat}
> 17/08/27 20:10:57 WARN datanode.DataNode: Unexpected exception in block pool Block pool
BP-821804790-172.16.165.133-1503887277256 (Datanode Uuid 7ee16a59-9604-406e-a0f8-6f44650a725b)
service to ozone1.fyre.ibm.com/172.16.165.133:8111
> java.lang.NullPointerException
> 	at org.apache.hadoop.ozone.container.common.helpers.ContainerData.getFromProtBuf(ContainerData.java:66)
> 	at org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.readContainerInfo(ContainerManagerImpl.java:210)
> 	at org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.init(ContainerManagerImpl.java:158)
> 	at org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.<init>(OzoneContainer.java:99)
> 	at org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.<init>(DatanodeStateMachine.java:77)
> 	at org.apache.hadoop.hdfs.server.datanode.DataNode.bpRegistrationSucceeded(DataNode.java:1592)
> 	at org.apache.hadoop.hdfs.server.datanode.BPOfferService.registrationSucceeded(BPOfferService.java:409)
> 	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:783)
> 	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:286)
> 	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We should add a NPE check and mark such containers as inactive without failing the SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message