Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CB35117EAD for ; Wed, 30 Sep 2015 13:22:08 +0000 (UTC) Received: (qmail 37046 invoked by uid 500); 30 Sep 2015 13:22:05 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 36998 invoked by uid 500); 30 Sep 2015 13:22:05 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 36984 invoked by uid 99); 30 Sep 2015 13:22:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Sep 2015 13:22:05 +0000 Date: Wed, 30 Sep 2015 13:22:05 +0000 (UTC) From: "Thomas Demoor (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HADOOP-11684) S3a to use thread pool that blocks clients MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14936822#comment-14936822 ] Thomas Demoor commented on HADOOP-11684: ---------------------------------------- One has to take into account that s3a runs within the "Hadoop container" (Mapper / Reducer / ...). The new defaults allow for 3 (active uploads = threads.max) + 1 (queued upload = max.total.tasks) + 1 (active upload = in calling thread due to CallerRuns) = 5 concurrent uploads *per Hadoop container* on the node. This should easily fill up the network pipe of the node, whereas, on my setup, the current (much higher) defaults cause starvation. Thus, with CallerRuns (003.patch), if extra upload attempts are made by *other threads* they will cause concurrent upload 6,7,8,..., likely running the JVM out of memory. [~stevel@apache.org], do you agree we need the approach from 002.patch, which is robust against this behaviour? We've been running MR-style workflows on our test-cluster with 002.patch for a while now (~ 2 months) and haven't run into any issues. Of course, additional testing (more workflows) would be welcome. > S3a to use thread pool that blocks clients > ------------------------------------------ > > Key: HADOOP-11684 > URL: https://issues.apache.org/jira/browse/HADOOP-11684 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 2.7.0 > Reporter: Thomas Demoor > Assignee: Thomas Demoor > Attachments: HADOOP-11684-001.patch, HADOOP-11684-002.patch, HADOOP-11684-003.patch > > > Currently, if fs.s3a.max.total.tasks are queued and another (part)upload wants to start, a RejectedExecutionException is thrown. > We should use a threadpool that blocks clients, nicely throtthling them, rather than throwing an exception. F.i. something similar to https://github.com/apache/incubator-s4/blob/master/subprojects/s4-comm/src/main/java/org/apache/s4/comm/staging/BlockingThreadPoolExecutorService.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)