hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Mackrory (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13826) S3A Deadlock in multipart copy due to thread pool limits.
Date Wed, 23 Nov 2016 17:05:58 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15690733#comment-15690733

Sean Mackrory commented on HADOOP-13826:

{quote}tests are pretty raw, production code less so.{quote}

Yeah, not remotely proposing this for inclusion yet - just a proof of concept. As I increased
the number of parallel renames, I start hitting deadlocks again. I had a threadpool dedicated
entirely to ControlMonitor tasks, and once that filled up it was deadlock. I guess this is
because my executor gets wrapped by other executors that have a single queue, and if the next
item is a ControlMonitor task and the ControlMonitor task pool is filled, then we're back
to square one. Rather than getting wrapped in 2 other types of executors (to add the listening
and blocking behavior, respectively) I think to make this work we would have to bring that
logic inside my S3TransferExecutor class so that all tasks were immediately segregated by
type as soon as they were handed off from the AWS SDK.

I'll hold off until actually implementing that until there's more consensus on if that's even
the right approach. My approach definitely increased the number of parallel operations you
could get away with before hitting a deadlock, but until the entire executor chain does this
it can't fix the core issue.

> S3A Deadlock in multipart copy due to thread pool limits.
> ---------------------------------------------------------
>                 Key: HADOOP-13826
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13826
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 2.7.3
>            Reporter: Sean Mackrory
>         Attachments: HADOOP-13826.001.patch, HADOOP-13826.002.patch
> In testing HIVE-15093 we have encountered deadlocks in the s3a connector. The TransferManager
javadocs (http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/transfer/TransferManager.html)
explain how this is possible:
> {quote}It is not recommended to use a single threaded executor or a thread pool with
a bounded work queue as control tasks may submit subtasks that can't complete until all sub
tasks complete. Using an incorrectly configured thread pool may cause a deadlock (I.E. the
work queue is filled with control tasks that can't finish until subtasks complete but subtasks
can't execute because the queue is filled).{quote}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message