hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Work logged] (HDDS-1376) Datanode exits while executing client command when scmId is null
Date Thu, 11 Apr 2019 01:02:00 GMT

     [ https://issues.apache.org/jira/browse/HDDS-1376?focusedWorklogId=225897&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-225897
]

ASF GitHub Bot logged work on HDDS-1376:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 11/Apr/19 01:01
            Start Date: 11/Apr/19 01:01
    Worklog Time Spent: 10m 
      Work Description: arp7 commented on pull request #724: HDDS-1376. Datanode exits while
executing client command when scmId is null
URL: https://github.com/apache/hadoop/pull/724#discussion_r274226324
 
 

 ##########
 File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/states/endpoint/VersionEndpointTask.java
 ##########
 @@ -106,7 +106,8 @@ public VersionEndpointTask(EndpointStateMachine rpcEndPoint,
           volumeSet.writeUnlock();
         }
 
-        ozoneContainer.getDispatcher().setScmId(scmId);
+        // Start the container services after getting the version information
+        ozoneContainer.start(scmId);
 
 Review comment:
   How does the fix work? I don't understand this startup sequence well enough to figure it
out.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 225897)
    Time Spent: 20m  (was: 10m)

> Datanode exits while executing client command when scmId is null
> ----------------------------------------------------------------
>
>                 Key: HDDS-1376
>                 URL: https://issues.apache.org/jira/browse/HDDS-1376
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: Ozone Datanode
>    Affects Versions: 0.4.0
>            Reporter: Mukul Kumar Singh
>            Assignee: Hanisha Koneru
>            Priority: Major
>              Labels: MiniOzoneChaosCluster, pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Ozone Datanode exits with the following error, this happens because DN hasn't received
a scmID from the SCM after registration but is processing a client command.
> {code}
> 2019-04-03 17:02:10,958 ERROR storage.RaftLogWorker (ExitUtils.java:terminate(133)) -
Terminating with exit status 1: df6b578e-8d35-44f5-9b21-db7184dcc54e-RaftLogWorker failed.
> java.io.IOException: java.lang.NullPointerException: scmId cannot be null
>         at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54)
>         at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61)
>         at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:83)
>         at org.apache.ratis.server.storage.RaftLogWorker$StateMachineDataPolicy.getFromFuture(RaftLogWorker.java:76)
>         at org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:354)
>         at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:219)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException: scmId cannot be null
>         at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
>         at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.create(KeyValueContainer.java:110)
>         at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleCreateContainer(KeyValueHandler.java:243)
>         at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:165)
>         at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.createContainer(HddsDispatcher.java:350)
>         at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:224)
>         at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:149)
>         at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:347)
>         at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:354)
>         at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$handleWriteChunk$0(ContainerStateMachine.java:385)
>         at java.util.concurrent.CompletableFuture$AsyncSupply.run$$$capture(CompletableFuture.java:1590)
>         at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         ... 1 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message