Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-issues@hadoop.apache.org
Date: Wed, 30 Sep 2015 13:22:05 +0000 (UTC)
From: "Thomas Demoor (JIRA)" <jira@apache.org>
To: common-issues@hadoop.apache.org
Message-ID: <JIRA.12780109.1425658702000.112663.1443619325411@Atlassian.JIRA>
In-Reply-To: <JIRA.12780109.1425658702000@Atlassian.JIRA>
References: <JIRA.12780109.1425658702000@Atlassian.JIRA>
 <JIRA.12780109.1425658702068@arcas>
Subject: [jira] [Commented] (HADOOP-11684) S3a to use thread pool that
 blocks clients
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HADOOP-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14936822#comment-14936822 ] 

Thomas Demoor commented on HADOOP-11684:
----------------------------------------

One has to take into account that s3a runs within the  "Hadoop container" (Mapper / Reducer / ...). The new defaults allow for 3 (active uploads = threads.max) + 1 (queued upload = max.total.tasks) + 1 (active upload = in calling thread due to CallerRuns) = 5 concurrent uploads *per Hadoop container* on the node. This should easily fill up the network pipe of the node, whereas, on my setup, the current (much higher) defaults cause starvation.

Thus, with CallerRuns (003.patch), if extra upload attempts are made by *other threads* they will cause concurrent upload 6,7,8,..., likely running the JVM out of memory. [~stevel@apache.org], do you agree we need the approach from 002.patch, which is robust against this behaviour?

We've been running MR-style workflows on our test-cluster with 002.patch for a while now (~ 2 months) and haven't run into any issues. Of course, additional testing (more workflows) would be welcome.


> S3a to use thread pool that blocks clients
> ------------------------------------------
>
>                 Key: HADOOP-11684
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11684
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.7.0
>            Reporter: Thomas Demoor
>            Assignee: Thomas Demoor
>         Attachments: HADOOP-11684-001.patch, HADOOP-11684-002.patch, HADOOP-11684-003.patch
>
>
> Currently, if fs.s3a.max.total.tasks are queued and another (part)upload wants to start, a RejectedExecutionException is thrown. 
> We should use a threadpool that blocks clients, nicely throtthling them, rather than throwing an exception. F.i. something similar to https://github.com/apache/incubator-s4/blob/master/subprojects/s4-comm/src/main/java/org/apache/s4/comm/staging/BlockingThreadPoolExecutorService.java


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)