hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anu Engineer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later
Date Wed, 12 Jul 2017 00:02:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083229#comment-16083229
] 

Anu Engineer commented on HDFS-12098:
-------------------------------------

@Weiwei yang, Can you please share your repro steps once again ? or look at this test patch
that I have created ? 

I have added a disable SCM call, when tests run, I can see we do not hit the SCM.
{code}
java.net.SocketTimeoutException: Call From hw11767.home/192.168.29.224 to 0.0.0.0:58880 failed
on socket timeout exception: java.net.SocketTimeoutException: 1000 millis timeout while waiting
for channel to be ready for read. ch : java.nio.channels
{code}

However, I am not able to see many Datanode state machine threads. Please see the attached
snapshot from my profiler.
I have also attached a test case that I developed to simulate and debug this case.

Thanks
Anu





> Ozone: Datanode is unable to register with scm if scm starts later
> ------------------------------------------------------------------
>
>                 Key: HDFS-12098
>                 URL: https://issues.apache.org/jira/browse/HDFS-12098
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, ozone, scm
>            Reporter: Weiwei Yang
>            Assignee: Weiwei Yang
>            Priority: Critical
>         Attachments: HDFS-12098-HDFS-7240.001.patch, HDFS-12098-HDFS-7240.002.patch,
Screen Shot 2017-07-11 at 4.58.08 PM.png, thread_dump.log
>
>
> Reproducing steps
> # Start datanode
> # Wait and see datanode state, it has connection issues, this is expected
> # Start SCM, expecting datanode could connect to the scm and the state machine could
transit to RUNNING. However in actual, its state transits to SHUTDOWN, datanode enters chill
mode.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message