From common-issues-return-187240-archive-asf-public=cust-asf.ponee.io@hadoop.apache.org  Thu Sep 19 12:18:03 2019
Return-Path: <common-issues-return-187240-archive-asf-public=cust-asf.ponee.io@hadoop.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [207.244.88.153])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 50C8C18062C
	for <archive-asf-public@cust-asf.ponee.io>; Thu, 19 Sep 2019 14:18:03 +0200 (CEST)
Received: (qmail 38127 invoked by uid 500); 19 Sep 2019 12:18:02 -0000
Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:common-issues-help@hadoop.apache.org>
List-Unsubscribe: <mailto:common-issues-unsubscribe@hadoop.apache.org>
List-Post: <mailto:common-issues@hadoop.apache.org>
List-Id: <common-issues.hadoop.apache.org>
Delivered-To: mailing list common-issues@hadoop.apache.org
Received: (qmail 38111 invoked by uid 99); 19 Sep 2019 12:18:02 -0000
Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139)
    by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Sep 2019 12:18:02 +0000
Received: from jira-he-de.apache.org (static.172.67.40.188.clients.your-server.de [188.40.67.172])
	by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 994CFE3140
	for <common-issues@hadoop.apache.org>; Thu, 19 Sep 2019 12:18:01 +0000 (UTC)
Received: from jira-he-de.apache.org (localhost.localdomain [127.0.0.1])
	by jira-he-de.apache.org (ASF Mail Server at jira-he-de.apache.org) with ESMTP id 4D0997807DA
	for <common-issues@hadoop.apache.org>; Thu, 19 Sep 2019 12:18:00 +0000 (UTC)
Date: Thu, 19 Sep 2019 12:18:00 +0000 (UTC)
From: "Steve Loughran (Jira)" <jira@apache.org>
To: common-issues@hadoop.apache.org
Message-ID: <JIRA.13256511.1568375638000.11932.1568895480312@Atlassian.JIRA>
In-Reply-To: <JIRA.13256511.1568375638000@Atlassian.JIRA>
References: <JIRA.13256511.1568375638000@Atlassian.JIRA> <JIRA.13256511.1568375638754@jira-he-de>
Subject: [jira] [Commented] (HADOOP-16570) S3A committers leak
 threads/raises OOM on job/task commit at scale
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394


    [ https://issues.apache.org/jira/browse/HADOOP-16570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933323#comment-16933323 ] 

Steve Loughran commented on HADOOP-16570:
-----------------------------------------

h2. Plan

h3. now: 

* listPendingUploads lists all pendingset files, loads these JSON files (in separate threasds)  to produce a single list of all files to commit or abort.
* commit/abort is done in the thread pool
* _SUCCESS file lists all the written files; gets used in tests to verify that (a) number of files >0 and b that the filenames match the store state, expected values, etc.

h3. proposed

* list all .pendingset files
* hand off load and commit/abort to the threads so the no. of actively loaded files is limited to the #of active threads.
* limit size of success file to first, say, 500 entries; a counter field will be updated to give the final number.  This will be enough for all integration tests that use the file that I know of.

h2. Troublespots

h3. Handling failure to load a file, i.e rolling back the commit

* We currently encounter all failures to load any file before a single upload has been committed. With incremental load and commit, that doesn't hold any more and it will fail partway through the operation.
* we currently roll back failures to commit during the phase where we completely uploads, by deleting the files. For that we need the entire list of committed files.

h3. Partition directory output

The partitioned committer uses the list of pending uploads to identify leaf directories and apply its policy to them; 

* the Fail policy fails to commit before a single file has been written.
* the Replace policy deletes all the files in those directories
* the append policy doesn't care.

The only way to implement the same checks with incremental loads Will be to do an initial scan to build up the tree of leaf directories and then apply the chosen policy.

Together this implies we'll probably have to do an initial preload scan of all pending files, at least for that partitioned committer. It's the one where we don't want to write a single file if there are problems, and we need to build that tree up.

The other committers can react to failures during incremental commits more simply:

1. Abort all pending MPUs under the output directory.
2. Delete all files under the output directory.

I'll have to think about how best to restructure the code to do this, but it is possible.


> S3A committers leak threads/raises OOM on job/task commit at scale
> ------------------------------------------------------------------
>
>                 Key: HADOOP-16570
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16570
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2.0, 3.1.2
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>
> The fixed size ThreadPool created in AbstractS3ACommitter doesn't get cleaned up at EOL; as a result you leak the no. of threads set in "fs.s3a.committer.threads"
> Not visible in MR/distcp jobs, but ultimately causes OOM on Spark


--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org