hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13826) S3A Deadlock in multipart copy due to thread pool limits.
Date Tue, 29 Nov 2016 11:59:58 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15705101#comment-15705101

Steve Loughran commented on HADOOP-13826:

they're kept consistent for a reason, not just because it simplifies configuration in general.

S3 stores separate parts in separate files; you should get better performance when reading
parts separately; that is, for max speed you should set s3a  block size == upload partition
size. By doing a copy with copy partition size == upload part size, we hope to preserve that
performance on later reads. Who knows, maybe it will even help copy performance.

what would be ideal would be to know the part size of an object; HADOOP-13261 proposed adding
a custom header for this. However, time spent looking at split calculation performance has
convinced me that a new header would be useless there; the overhead of querying the objects
makes it too expensive. We could start uploading it though, and maybe use it for a copy. still
expensive though; a 400mS HEAD would be about 2MB of copy bandwidth based on my (ad-hoc) measurements
of copy B/W of 6 MB/s

> S3A Deadlock in multipart copy due to thread pool limits.
> ---------------------------------------------------------
>                 Key: HADOOP-13826
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13826
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 2.7.3
>            Reporter: Sean Mackrory
>         Attachments: HADOOP-13826.001.patch, HADOOP-13826.002.patch
> In testing HIVE-15093 we have encountered deadlocks in the s3a connector. The TransferManager
javadocs (http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/transfer/TransferManager.html)
explain how this is possible:
> {quote}It is not recommended to use a single threaded executor or a thread pool with
a bounded work queue as control tasks may submit subtasks that can't complete until all sub
tasks complete. Using an incorrectly configured thread pool may cause a deadlock (I.E. the
work queue is filled with control tasks that can't finish until subtasks complete but subtasks
can't execute because the queue is filled).{quote}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message