Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 549AD18649 for ; Thu, 28 Apr 2016 20:13:15 +0000 (UTC) Received: (qmail 38210 invoked by uid 500); 28 Apr 2016 20:13:13 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 38037 invoked by uid 500); 28 Apr 2016 20:13:13 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 37733 invoked by uid 99); 28 Apr 2016 20:13:13 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Apr 2016 20:13:13 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 19CF12C1F74 for ; Thu, 28 Apr 2016 20:13:13 +0000 (UTC) Date: Thu, 28 Apr 2016 20:13:13 +0000 (UTC) From: "Mariappan Asokan (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (MAPREDUCE-6628) Potential memory leak in CryptoOutputStream MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mariappan Asokan updated MAPREDUCE-6628: ---------------------------------------- Attachment: MAPREDUCE-6628.005.patch > Potential memory leak in CryptoOutputStream > ------------------------------------------- > > Key: MAPREDUCE-6628 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6628 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security > Reporter: Mariappan Asokan > Assignee: Mariappan Asokan > Attachments: MAPREDUCE-6628.001.patch, MAPREDUCE-6628.002.patch, MAPREDUCE-6628.003.patch, MAPREDUCE-6628.004.patch, MAPREDUCE-6628.005.patch > > > There is a potential memory leak in {{CryptoOutputStream.java.}} It allocates two direct byte buffers ({{inBuffer}} and {{outBuffer}}) that get freed when {{close()}} method is called. Most of the time, {{close()}} method is called. However, when writing to intermediate Map output file or the spill files in {{MapTask}}, {{close()}} is never called since calling so would close the underlying stream which is not desirable. There is a single underlying physical stream that contains multiple logical streams one per partition of Map output. > By default the amount of memory allocated per byte buffer is 128 KB and so the total memory allocated is 256 KB, This may not sound much. However, if the number of partitions (or number of reducers) is large (in the hundreds) and/or there are spill files created in {{MapTask}}, this can grow into a few hundred MB. > I can think of two ways to address this issue: > h2. Possible Fix - 1 > According to JDK documentation: > {quote} > The contents of direct buffers may reside outside of the normal garbage-collected heap, and so their impact upon the memory footprint of an application might not be obvious. It is therefore recommended that direct buffers be allocated primarily for large, long-lived buffers that are subject to the underlying system's native I/O operations. In general it is best to allocate direct buffers only when they yield a measureable gain in program performance. > {quote} > It is not clear to me whether there is any benefit of allocating direct byte buffers in {{CryptoOutputStream.java}}. In fact, there is a slight CPU overhead in moving data from {{outBuffer}} to a temporary byte array as per the following code in {{CryptoOutputStream.java}}. > {code} > /* > * If underlying stream supports {@link ByteBuffer} write in future, needs > * refine here. > */ > final byte[] tmp = getTmpBuf(); > outBuffer.get(tmp, 0, len); > out.write(tmp, 0, len); > {code} > Even if the underlying stream supports direct byte buffer IO (or direct IO in OS parlance), it is not clear whether it will yield any measurable performance gain. > The fix would be to allocate a ByteBuffer on the heap for inBuffer and wrap a byte array in a {{ByteBuffer}} for {{outBuffer}}. By the way, the {{inBuffer}} and {{outBuffer}} have to be {{ByteBuffer}} as demanded by the {{encrypt()}} method in {{Encryptor}}. > h2. Possible Fix - 2 > Assuming that we want to keep the buffers as direct byte buffers, we can create a new constructor to {{CryptoOutputStream}} and pass a boolean flag {{ownOutputStream}} to indicate whether the underlying stream will be owned by {{CryptoOutputStream}}. If it is true, then calling the {{close()}} method will close the underlying stream. Otherwise, when {{close()}} is called only the direct byte buffers will be freed and the underlying stream will not be closed. > The scope of changes for this fix will be somewhat wider. We need to modify {{MapTask.java}}, {{CryptoUtils.java}}, and {{CryptoFSDataOutputStream.java}} as well to pass the ownership flag mentioned above. > I can post a patch for either of the above. I welcome any other ideas from developers to fix this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)