hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Mackrory (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-13826) S3A Deadlock in multipart copy due to thread pool limits.
Date Tue, 06 Dec 2016 17:47:58 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-13826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Sean Mackrory updated HADOOP-13826:
    Attachment: HADOOP-13826.003.patch

For the sake of trying stuff out, attaching a patch that gives an unbounded ThreadPoolExecutor
to the BlockingThreadPoolExecutorService, and the original unbounded one to everything else.
All tests pass, including the new test that was previously able to induce a deadlock.

I like [~Thomas Demoor]'s point about the control tasks not being memory intensive: having
control tasks in an unbounded queue and not having to worry about them overwhelming resources
too easily would solve the concern about how to make all these individual pools easily configurable.
I'm fairly certain my original proposal would work more completely if rather than having 3
nested executors and only the inner-most one separating tasks into isolated pools, the outer-most
executor immediately separated tasks into their own queues as well, and that would still need
to be done, but there's still also the concern about relying on internal AWS APIs, which we
should probably avoid.

> S3A Deadlock in multipart copy due to thread pool limits.
> ---------------------------------------------------------
>                 Key: HADOOP-13826
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13826
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 2.7.3
>            Reporter: Sean Mackrory
>            Assignee: Sean Mackrory
>            Priority: Critical
>         Attachments: HADOOP-13826.001.patch, HADOOP-13826.002.patch, HADOOP-13826.003.patch
> In testing HIVE-15093 we have encountered deadlocks in the s3a connector. The TransferManager
javadocs (http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/transfer/TransferManager.html)
explain how this is possible:
> {quote}It is not recommended to use a single threaded executor or a thread pool with
a bounded work queue as control tasks may submit subtasks that can't complete until all sub
tasks complete. Using an incorrectly configured thread pool may cause a deadlock (I.E. the
work queue is filled with control tasks that can't finish until subtasks complete but subtasks
can't execute because the queue is filled).{quote}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message