hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-12098) Ozone: Datanode is unable to register with scm if scm starts later
Date Mon, 17 Jul 2017 05:42:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16089336#comment-16089336
] 

Hadoop QA commented on HDFS-12098:
----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 19s{color} | {color:blue}
Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} |
{color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m  0s{color}
| {color:green} The patch appears to include 2 new or modified test files. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 39s{color}
| {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 51s{color} |
{color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 39s{color}
| {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 59s{color} |
{color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 52s{color} |
{color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 51s{color} |
{color:green} HDFS-7240 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 52s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 50s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 50s{color} | {color:green}
the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  0m 36s{color}
| {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 16 new + 154 unchanged
- 0 fixed = 170 total (was 154) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 53s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m  0s{color} | {color:red}
hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 49s{color} |
{color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 23s{color} | {color:red}
hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 21s{color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}102m 13s{color} | {color:black}
{color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  Inconsistent synchronization of org.apache.hadoop.hdfs.server.datanode.DataNode.datanodeStateMachine;
locked 42% of time  Unsynchronized access at DataNode.java:42% of time  Unsynchronized access
at DataNode.java:[line 3228] |
| Failed junit tests | hadoop.ozone.container.replication.TestContainerReplicationManager
|
|   | hadoop.ozone.TestMiniOzoneCluster |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 |
|   | hadoop.ozone.TestStorageContainerManager |
| Timed out junit tests | org.apache.hadoop.hdfs.TestLeaseRecovery2 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HDFS-12098 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12877520/HDFS-12098-HDFS-7240.testcase.patch
|
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  unit  findbugs
 checkstyle  |
| uname | Linux a4fe1c2f42ae 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh |
| git revision | HDFS-7240 / 1bec6a1 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/20299/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
|
| findbugs | https://builds.apache.org/job/PreCommit-HDFS-Build/20299/artifact/patchprocess/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html
|
| unit | https://builds.apache.org/job/PreCommit-HDFS-Build/20299/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
|
|  Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/20299/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs |
| Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/20299/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Ozone: Datanode is unable to register with scm if scm starts later
> ------------------------------------------------------------------
>
>                 Key: HDFS-12098
>                 URL: https://issues.apache.org/jira/browse/HDFS-12098
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, ozone, scm
>            Reporter: Weiwei Yang
>            Assignee: Weiwei Yang
>            Priority: Critical
>         Attachments: disabled-scm-test.patch, HDFS-12098-HDFS-7240.001.patch, HDFS-12098-HDFS-7240.002.patch,
HDFS-12098-HDFS-7240.testcase.patch, Screen Shot 2017-07-11 at 4.58.08 PM.png, thread_dump.log
>
>
> Reproducing steps
> 1. Start namenode
> {{./bin/hdfs --daemon start namenode}}
> 2. Start datanode
> {{./bin/hdfs datanode}}
> will see following connection issues
> {noformat}
> 17/07/13 21:16:48 INFO ipc.Client: Retrying connect to server: ozone1.fyre.ibm.com/172.16.165.133:9861.
Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1 SECONDS)
> 17/07/13 21:16:49 INFO ipc.Client: Retrying connect to server: ozone1.fyre.ibm.com/172.16.165.133:9861.
Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1 SECONDS)
> 17/07/13 21:16:50 INFO ipc.Client: Retrying connect to server: ozone1.fyre.ibm.com/172.16.165.133:9861.
Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1 SECONDS)
> 17/07/13 21:16:51 INFO ipc.Client: Retrying connect to server: ozone1.fyre.ibm.com/172.16.165.133:9861.
Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1 SECONDS)
> {noformat}
> this is expected because scm is not started yet
> 3. Start scm
> {{./bin/hdfs scm}}
> expecting datanode can register to this scm, expecting the log in scm
> {noformat}
> 17/07/13 21:22:30 INFO node.SCMNodeManager: Data node with ID: af22862d-aafa-4941-9073-53224ae43e2c
Registered.
> {noformat}
> but did *NOT* see this log. (_I debugged into the code and found the datanode state was
transited SHUTDOWN unexpectedly because the thread leaks, each of those threads counted to
set to next state and they all set to SHUTDOWN state_)
> 4. Create a container from scm CLI
> {{./bin/hdfs scm -container -create -c 20170714c0}}
> this fails with following exception
> {noformat}
> Creating container : 20170714c0.
> Error executing command:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.scm.exceptions.SCMException):
Unable to create container while in chill mode
> 	at org.apache.hadoop.ozone.scm.container.ContainerMapping.allocateContainer(ContainerMapping.java:241)
> 	at org.apache.hadoop.ozone.scm.StorageContainerManager.allocateContainer(StorageContainerManager.java:392)
> 	at org.apache.hadoop.ozone.protocolPB.StorageContainerLocationProtocolServerSideTranslatorPB.allocateContainer(StorageContainerLocationProtocolServerSideTranslatorPB.java:73)
> {noformat}
> datanode was not registered to scm, thus it's still in chill mode.
> *Note*, if we start scm first, there is no such issue, I can create container from CLI
without any problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message