hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10931) libhdfs++: Fix object lifecycle issues in the BlockReader
Date Thu, 29 Sep 2016 23:55:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15534452#comment-15534452
] 

Hadoop QA commented on HDFS-10931:
----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 54s{color} | {color:blue}
Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} |
{color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  0s{color} | {color:red}
The patch doesn't appear to include any new or modified tests. Please justify why no new tests
are needed for this patch. Also please list what manual steps were performed to verify this
patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 45s{color}
| {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m  4s{color} |
{color:green} HDFS-8707 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m  1s{color} |
{color:green} HDFS-8707 passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 16s{color} |
{color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 13s{color}
| {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m  9s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m  1s{color} |
{color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  6m  1s{color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m  1s{color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m  2s{color} |
{color:green} the patch passed with JDK v1.7.0_111 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  6m  2s{color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m  2s{color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 12s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m  9s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m  0s{color} | {color:red}
hadoop-hdfs-native-client in the patch failed with JDK v1.7.0_111. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 18s{color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 65m  1s{color} | {color:black}
{color} |
\\
\\
|| Reason || Tests ||
| JDK v1.7.0_111 Failed CTEST tests | test_libhdfs_threaded_hdfspp_test_shim_static |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:78fc6b6 |
| JIRA Issue | HDFS-10931 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12831009/HDFS-10931.HDFS-8707.000.patch
|
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 279f94bcfc4b 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh |
| git revision | HDFS-8707 / fbba214 |
| Default Java | 1.7.0_111 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_101 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_111
|
| CTEST | https://builds.apache.org/job/PreCommit-HDFS-Build/16934/artifact/patchprocess/patch-hadoop-hdfs-project_hadoop-hdfs-native-client-jdk1.7.0_111-ctest.txt
|
| unit | https://builds.apache.org/job/PreCommit-HDFS-Build/16934/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-native-client-jdk1.7.0_111.txt
|
| JDK v1.7.0_111  Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16934/testReport/
|
| modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: hadoop-hdfs-project/hadoop-hdfs-native-client
|
| Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16934/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> libhdfs++: Fix object lifecycle issues in the BlockReader
> ---------------------------------------------------------
>
>                 Key: HDFS-10931
>                 URL: https://issues.apache.org/jira/browse/HDFS-10931
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: James Clampffer
>            Assignee: James Clampffer
>            Priority: Critical
>         Attachments: HDFS-10931.HDFS-8707.000.patch
>
>
> The BlockReader can work itself into a a state during AckRead (possibly other stages
as well) where the pipeline posts a task for asio with a pointer back into itself, then promptly
calls "delete this" without canceling the asio request.  The asio task finishes and tries
to acquire the lock at the address where the DataNodeConnection used to live - but the DN
connection is no longer valid so it's scribbling on some arbitrary bit of memory.  On some
platforms the underlying address used by the mutex state will be handed out to future mutexes
so the scribble breaks that state and all the locks in that process start misbehaving.
> This can be reproduced by using the patch from HDFS-8790 and adding more worker threads
+ a lot more reader threads.
> I'm going to fix this in two parts:
> 1) Duct tape + superglue patch to make sure that all top level continuations in the block
reader pipeline hold a shared_ptr to the DataNodeConnection.  Nested continuations also get
a copy of the shared_ptr to make sure the connection is alive.  This at least keeps the connection
alive so that it can keep returning asio::operation_aborted.
> 2) The continuation stuff needs a lot of work to make sure this type of bug doesn't keep
popping up.  We've already fixed these issues in the RPC code.  This will most likely need
to be split into a few jiras.
> - Continuation "framework" can be slimmed down quite a bit, perhaps even removed.  Near
zero documentation + many implied contracts = constant bug chasing.
> - Add comments to actually describe what's going on in the networking code.  This bug
took significantly longer than it should have to track down because I hadn't worked on the
BlockReader in a while.
> - No more "delete this".
> - Flatten out nested continuations e.g. the guts of BlockReaderImpl::AckRead.  It's unclear
why they were implemented like this in the first place and there's no comments to indicate
that this was intentional.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message