Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-dev@hadoop.apache.org
Date: Tue, 19 Mar 2013 21:21:16 +0000 (UTC)
From: "Robert Joseph Evans (JIRA)" <jira@apache.org>
To: common-dev@hadoop.apache.org
Message-ID: <JIRA.12637802.1363718068512.14665.1363728076949@arcas>
In-Reply-To: <JIRA.12637802.1363718068512@arcas>
References: <JIRA.12637802.1363718068512@arcas>
Subject: [jira] [Resolved] (HADOOP-9419) CodecPool should avoid OOMs with
 buggy codecs
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HADOOP-9419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans resolved HADOOP-9419.
-----------------------------------------

    Resolution: Won't Fix

Never mind.  I created a patch, and it is completely useless in fixing this problem.  The tasks still OOM because the codec itself is so small and the MergeManager creates new codecs so quickly that on a job with lots of reduces it literally uses up all of the address space with direct byte buffers.  Some of the processes get killed by the NM for going over the virtual address space before they OOM. We could try and have the CodecPool detect that the codec is doing the wrong thing and "correct" it for the codec, but that is too heavy handed in my opinion.
                
> CodecPool should avoid OOMs with buggy codecs
> ---------------------------------------------
>
>                 Key: HADOOP-9419
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9419
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Robert Joseph Evans
>
> I recently found a bug in the gpl compression libraries that was causing map tasks for a particular job to OOM.
> https://github.com/omalley/hadoop-gpl-compression/issues/3
> Now granted it does not make a lot of sense for a job to use the LzopCodec for map output compression over the LzoCodec, but arguably other codecs could be doing similar things and causing the same sort of memory leaks.  I propose that we do a sanity check when creating a new decompressor/compressor.  If the codec newly created object does not match the value from getType... it should turn off caching for that Codec.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira