Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7C30F200C23 for ; Wed, 22 Feb 2017 12:21:51 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 7AD02160B67; Wed, 22 Feb 2017 11:21:51 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id BFB6B160B49 for ; Wed, 22 Feb 2017 12:21:50 +0100 (CET) Received: (qmail 90485 invoked by uid 500); 22 Feb 2017 11:21:49 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 90472 invoked by uid 99); 22 Feb 2017 11:21:49 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Feb 2017 11:21:49 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 30D1A1A0740 for ; Wed, 22 Feb 2017 11:21:49 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.199 X-Spam-Level: X-Spam-Status: No, score=-1.199 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id X0fdRSZSv7GR for ; Wed, 22 Feb 2017 11:21:48 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id BFA805FE3F for ; Wed, 22 Feb 2017 11:21:47 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 763C1E0AF9 for ; Wed, 22 Feb 2017 11:21:46 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id DA68024130 for ; Wed, 22 Feb 2017 11:21:44 +0000 (UTC) Date: Wed, 22 Feb 2017 11:21:44 +0000 (UTC) From: "Steve Loughran (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 22 Feb 2017 11:21:51 -0000 [ https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15878025#comment-15878025 ] Steve Loughran commented on HADOOP-14028: ----------------------------------------- Checkstyle complaints unimportant {code} ./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ABlockOutputStream.java:382: : writeOperationHelper.newPutRequest(uploadData.getUploadStream(), size);: Line is longer than 80 characters (found 81). ./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ADataBlocks.java:212: protected final long index;:26: Variable 'index' must be private and have accessor methods. ./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ADataBlocks.java:213: protected final S3AInstrumentation.OutputStreamStatistics statistics;:63: Variable 'statistics' must be private and have accessor methods. ./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:1:/*: File length is 2,374 lines (max allowed is 2,000) {Code} # 81 chars is acceptable for readability IMO # the visible fields are all final and for package private classes. We are in control of where these fields are referenced; wrapping them is needless. # File length is insurmountable (and getting worse). I've already moved stuff out of S3aFileSystem (e.g. all the listing stuff); not sure what else can be done. I worry more about class complexity, especially with the s3guard changes, than about overall length. This patch is ready for review, and it is important. Volunteers? > S3A block output streams don't delete temporary files in multipart uploads > -------------------------------------------------------------------------- > > Key: HADOOP-14028 > URL: https://issues.apache.org/jira/browse/HADOOP-14028 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 > Affects Versions: 2.8.0 > Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2 > Reporter: Seth Fitzsimmons > Assignee: Steve Loughran > Priority: Critical > Attachments: HADOOP-14028-006.patch, HADOOP-14028-007.patch, HADOOP-14028-branch-2-001.patch, HADOOP-14028-branch-2.8-002.patch, HADOOP-14028-branch-2.8-003.patch, HADOOP-14028-branch-2.8-004.patch, HADOOP-14028-branch-2.8-005.patch, HADOOP-14028-branch-2.8-007.patch > > > I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I was looking for after running into the same OOM problems) and don't see it cleaning up the disk-cached blocks. > I'm generating a ~50GB file on an instance with ~6GB free when the process starts. My expectation is that local copies of the blocks would be deleted after those parts finish uploading, but I'm seeing more than 15 blocks in /tmp (and none of them have been deleted thus far). > I see that DiskBlock deletes temporary files when closed, but is it closed after individual blocks have finished uploading or when the entire file has been fully written to the FS (full upload completed, including all parts)? > As a temporary workaround to avoid running out of space, I'm listing files, sorting by atime, and deleting anything older than the first 20: `ls -ut | tail -n +21 | xargs rm` > Steve Loughran says: > > They should be deleted as soon as the upload completes; the close() call that the AWS httpclient makes on the input stream triggers the deletion. Though there aren't tests for it, as I recall. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: common-issues-help@hadoop.apache.org