hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen Liang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-12210) Block Storage: volume creation times out while creating 3TB volume because of too many containers
Date Mon, 25 Sep 2017 23:53:01 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16179972#comment-16179972
] 

Chen Liang commented on HDFS-12210:
-----------------------------------

Thanks [~msingh] for taking care of this and thanks [~anu] for the remind! +1 on the v002
patch, I've committed to the feature branch, thanks Mukul for the contribution!

> Block Storage: volume creation times out while creating 3TB volume because of too many
containers
> -------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-12210
>                 URL: https://issues.apache.org/jira/browse/HDFS-12210
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone
>    Affects Versions: HDFS-7240
>            Reporter: Mukul Kumar Singh
>            Assignee: Mukul Kumar Singh
>             Fix For: HDFS-7240
>
>         Attachments: HDFS-12210-HDFS-7240.001.patch, HDFS-12210-HDFS-7240.002.patch
>
>
> Volume creation times out while creating 3TB volume because of too many containers
> {code}
> [hdfs@ctr-e134-1499953498516-64773-01-000003 ~]$ /opt/hadoop/hadoop-3.0.0-beta1-SNAPSHOT/bin/hdfs
cblock -c bilbo disk1 3TB 4
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-3.0.0-beta1-SNAPSHOT/share/hadoop/hdfs/lib/logback-classic-1.0.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 17/07/28 09:32:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for
your platform... using builtin-java classes where applicable
> 17/07/28 09:32:40 INFO cli.CBlockCli: create volume:[bilbo, disk1, 3TB, 4]
> 17/07/28 09:33:10 ERROR cli.CBlockCli: java.net.SocketTimeoutException: Call From ctr-e134-1499953498516-64773-01-000003.hwx.site/172.27.51.64
to 0.0.0.0:9810 failed on socket timeout exception: java.net.SocketTimeoutException: 30000
millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected
local=/172.27.51.64:59317 remote=/0.0.0.0:9810]; For more details see:  http://wiki.apache.org/hadoop/SocketTimeout
> {code}
> Looking into the logs it can be seen that the volume 614 containers were created before
the timeout.
> {code}
> 2017-07-28 09:32:40,853 INFO org.apache.hadoop.cblock.CBlockManager: Create volume received:
userName: bilbo volumeName: disk1 volumeSize: 3298534883328 blockSize: 4096
> 2017-07-28 09:32:42,545 INFO org.apache.hadoop.scm.client.ContainerOperationClient: Created
container bilbo:disk1#0 leader:172.27.50.192:9866 machines:[172.27.50.192:9866] replication
factor:1
> 2017-07-28 09:32:43,213 INFO org.apache.hadoop.scm.client.ContainerOperationClient: Created
container bilbo:disk1#1 leader:172.27.51.65:9866 machines:[172.27.51.65:9866] replication
factor:1
> 2017-07-28 09:32:43,484 INFO org.apache.hadoop.scm.client.ContainerOperationClient: Created
container bilbo:disk1#2 leader:172.27.50.192:9866 machines:[172.27.50.192:9866] replication
factor:1
> .
> .
> .
> .
> 2017-07-28 09:35:01,712 INFO org.apache.hadoop.scm.client.ContainerOperationClient: Created
container bilbo:disk1#612 leader:172.27.50.128:9866 machines:[172.27.50.128:9866] replication
factor:1
> 2017-07-28 09:35:01,963 INFO org.apache.hadoop.scm.client.ContainerOperationClient: Created
container bilbo:disk1#613 leader:172.27.50.128:9866 machines:[172.27.50.128:9866] replication
factor:1
> 2017-07-28 09:35:02,256 INFO org.apache.hadoop.scm.client.ContainerOperationClient: Created
container bilbo:disk1#614 leader:172.27.50.192:9866 machines:[172.27.50.192:9866] replication
factor:1
> 2017-07-28 09:35:02,358 INFO org.apache.hadoop.cblock.CBlockManager: Create volume received:
userName: bilbo volumeName: disk2 volumeSize: 1099511627776 blockSize: 4096
> 2017-07-28 09:35:02,368 WARN org.apache.hadoop.ipc.Server: IPC Server handler 0 on 9810,
call Call#0 Retry#0 org.apache.hadoop.cblock.protocolPB.CBlockServiceProtocol.createVolume
from 172.27.51.64:59
> 317: output error
> 2017-07-28 09:35:02,369 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 9810
caught an exception
> java.nio.channels.ClosedChannelException
>         at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:270)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
>         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:3242)
>         at org.apache.hadoop.ipc.Server.access$1700(Server.java:137)
>         at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:1466)
>         at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1536)
>         at org.apache.hadoop.ipc.Server$Connection.sendResponse(Server.java:2586)
>         at org.apache.hadoop.ipc.Server$Connection.access$300(Server.java:1608)
>         at org.apache.hadoop.ipc.Server$RpcCall.doResponse(Server.java:933)
>         at org.apache.hadoop.ipc.Server$Call.sendResponse(Server.java:767)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:878)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message