hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mukul Kumar Singh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDDS-1451) SCMBlockManager findPipeline and createPipeline are not lock protected
Date Tue, 07 May 2019 05:10:00 GMT

    [ https://issues.apache.org/jira/browse/HDDS-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834389#comment-16834389
] 

Mukul Kumar Singh commented on HDDS-1451:
-----------------------------------------

[~avijayan], yes you are absolutely correct. The problem is exactly the one you have described.
Some pipelines can be created between the point of check for pipeline and point of allocating
the pipeline and then the pipeline creation will fail

> SCMBlockManager findPipeline and createPipeline are not lock protected
> ----------------------------------------------------------------------
>
>                 Key: HDDS-1451
>                 URL: https://issues.apache.org/jira/browse/HDDS-1451
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: SCM
>    Affects Versions: 0.3.0
>            Reporter: Mukul Kumar Singh
>            Assignee: Aravindan Vijayan
>            Priority: Major
>              Labels: MiniOzoneChaosCluster
>
> SCM BlockManager may try to allocate pipelines in the cases when it is not needed. This
happens because BlockManagerImpl#allocateBlock is not lock protected, so multiple pipelines
can be allocated from it. One of the pipeline allocation can fail even when one of the existing
pipeline already exists.
> {code}
> 2019-04-22 22:34:14,336 INFO  pipeline.RatisPipelineProvider (RatisPipelineProvider.java:lambda$create$1(103))
-  pipeline Pipeline[ Id: 6f4bb2d7-d660-4f9f-bc06-72b10f9a738e, Nodes: 76e1a493-fd55-4d67-9f5
> 5-c04fd6bd3a33{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}2b9850b2-aed3-4a40-91b5-2447dc5246bf{ip:
192.168.0.104, host: 192.168.0.104, certSerialId: null}12248721-ea6a-453f-8dad-fc7fbe692f
> d2{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, Factor:THREE,
State:OPEN]
> 2019-04-22 22:34:14,386 INFO  impl.RoleInfo (RoleInfo.java:shutdownLeaderElection(134))
- e17b7852-4691-40c7-8791-ad0b0da5201f: shutdown LeaderElection
> 2019-04-22 22:34:14,388 INFO  pipeline.RatisPipelineProvider (RatisPipelineProvider.java:lambda$create$1(103))
-  pipeline Pipeline[ Id: 552e28f3-98d9-41f3-86e0-c1b9494838a5, Nodes: e17b7852-4691-40c7-879
> 1-ad0b0da5201f{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}fd365bac-e26e-4b11-afd8-9d08cd1b0521{ip:
192.168.0.104, host: 192.168.0.104, certSerialId: null}9583a007-7f02-4074-9e26-19bc18e29e
> c5{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, Factor:THREE,
State:OPEN]
> 2019-04-22 22:34:14,388 INFO  impl.RoleInfo (RoleInfo.java:updateAndGet(143)) - e17b7852-4691-40c7-8791-ad0b0da5201f:
start FollowerState
> 2019-04-22 22:34:14,388 INFO  pipeline.RatisPipelineProvider (RatisPipelineProvider.java:lambda$create$1(103))
-  pipeline Pipeline[ Id: 5383151b-d625-4362-a7dd-c0d353acaf76, Nodes: 80f16ad6-3879-4a64-a3c
> 7-7719813cc139{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}082ce481-7fb0-4f88-ac21-82609290a6a2{ip:
192.168.0.104, host: 192.168.0.104, certSerialId: null}dd5f5a70-0217-4577-b7a2-c42aa139d1
> 8a{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, Factor:THREE,
State:OPEN]
> 2019-04-22 22:34:14,389 INFO  pipeline.RatisPipelineProvider (RatisPipelineProvider.java:lambda$create$1(103))
-  pipeline Pipeline[ Id: be4854e5-7933-4caa-b32e-f482cf500247, Nodes: 6e2356f1-479d-498b-876
> a-1c90623c498b{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}8ac46d94-9975-4eea-9448-2618c69d7bf3{ip:
192.168.0.104, host: 192.168.0.104, certSerialId: null}a3ed36a1-44ca-47b2-b9b3-5aeef04595
> 18{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, Factor:THREE,
State:OPEN]
> 2019-04-22 22:34:14,390 INFO  pipeline.RatisPipelineProvider (RatisPipelineProvider.java:lambda$create$1(103))
-  pipeline Pipeline[ Id: 21e368e2-f82a-4c61-9cc3-06e8de22ea6b, Nodes: 82632040-5754-4122-b187-331879586842{ip:
192.168.0.104, host: 192.168.0.104, certSerialId: null}923c8537-b869-4085-adcb-0a9accdcd089{ip:
192.168.0.104, host: 192.168.0.104, certSerialId: null}c6d790bf-e3a6-4064-acb5-f74796cd38a9{ip:
192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN]
> 2019-04-22 22:34:14,390 INFO  pipeline.RatisPipelineProvider (RatisPipelineProvider.java:lambda$create$1(103))
-  pipeline Pipeline[ Id: cccbc2ed-e0e2-4578-a8a2-94f4b645be52, Nodes: 91ae6848-a778-43be-a4a1-5855f7adc0d8{ip:
192.168.0.104, host: 192.168.0.104, certSerialId: null}8f330a03-40e2-4bd1-9b43-5e05b13d89f0{ip:
192.168.0.104, host: 192.168.0.104, certSerialId: null}4f3070dc-650b-48d7-87b5-d2076104e7b4{ip:
192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN]
> 2019-04-22 22:34:14,392 ERROR block.BlockManagerImpl (BlockManagerImpl.java:allocateBlock(192))
- Pipeline creation failed for type:RATIS factor:THREE
> org.apache.hadoop.hdds.scm.pipeline.InsufficientDatanodesException: Cannot create pipeline
of factor 3 using 2 nodes 20 healthy nodes 20 all nodes.
>         at org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.create(RatisPipelineProvider.java:122)
>         at org.apache.hadoop.hdds.scm.pipeline.PipelineFactory.create(PipelineFactory.java:57)
>         at org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.createPipeline(SCMPipelineManager.java:148)
>         at org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:190)
>         at org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:172)
>         at org.apache.hadoop.ozone.protocolPB.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:82)
>         at org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:7533)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> 2019-04-22 22:34:14,395 ERROR block.BlockManagerImpl (BlockManagerImpl.java:allocateBlock(213))
- Unable to allocate a block for the size: 16384, type: RATIS, factor: THREE
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message