hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nanda kumar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-13309) Ozone: Improve error message in case of missing nodes
Date Wed, 04 Apr 2018 15:19:01 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425696#comment-16425696
] 

Nanda kumar commented on HDFS-13309:
------------------------------------

[~xyao] & [~elek], we have to handle all the exceptions thrown by KSM/SCM properly in
the client. As the scope of this patch doesn't include exception handling at the client I'm
+1 on the change.

I will commit this shortly.

> Ozone: Improve error message in case of missing nodes
> -----------------------------------------------------
>
>                 Key: HDFS-13309
>                 URL: https://issues.apache.org/jira/browse/HDFS-13309
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: HDFS-7240
>    Affects Versions: HDFS-7240
>            Reporter: Elek, Marton
>            Assignee: Elek, Marton
>            Priority: Minor
>         Attachments: HDFS-13309-HDFS-7240.001.patch, HDFS-13309-HDFS-7240.002.patch
>
>
> During testing ozonefs with spark I found multiple error messages in the log:
> {code}
> scm_1              | java.lang.NullPointerException
> scm_1              | 	at org.apache.hadoop.ozone.scm.container.ContainerStates.ContainerStateMap.addContainer(ContainerStateMap.java:129)
> scm_1              | 	at org.apache.hadoop.ozone.scm.container.ContainerStateManager.allocateContainer(ContainerStateManager.java:308)
> scm_1              | 	at org.apache.hadoop.ozone.scm.container.ContainerMapping.allocateContainer(ContainerMapping.java:244)
> scm_1              | 	at org.apache.hadoop.ozone.scm.block.BlockManagerImpl.preAllocateContainers(BlockManagerImpl.java:189)
> scm_1              | 	at org.apache.hadoop.ozone.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:291)
> scm_1              | 	at org.apache.hadoop.ozone.scm.StorageContainerManager.allocateBlock(StorageContainerManager.java:1131)
> scm_1              | 	at org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:109)
> scm_1              | 	at org.apache.hadoop.hdsl.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:8038)
> scm_1              | 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> scm_1              | 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1007)
> scm_1              | 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:873)
> scm_1              | 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:819)
> scm_1              | 	at java.security.AccessController.doPrivileged(Native Method)
> scm_1              | 	at javax.security.auth.Subject.doAs(Subject.java:422)
> scm_1              | 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
> scm_1              | 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2679)
> {code}
> The problem is that PiplineManager..getPipeline() may return with null if pipline couldn't
be found/establised (for example if I have not enogh nodes for a ratis ring).
> In ContainerStateMap.addContainer this pipline is expected to be not null.
> I suggest to do an additional check in ContainerStateManager.allocateContainer and return
with more meaningfull error message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message