Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C50D9F36D for ; Sat, 23 Mar 2013 04:15:20 +0000 (UTC) Received: (qmail 39147 invoked by uid 500); 23 Mar 2013 04:15:18 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 38790 invoked by uid 500); 23 Mar 2013 04:15:17 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 38729 invoked by uid 99); 23 Mar 2013 04:15:16 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 23 Mar 2013 04:15:16 +0000 Date: Sat, 23 Mar 2013 04:15:16 +0000 (UTC) From: "David Parks (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HADOOP-9295) AbstractMapWritable throws exception when calling readFields() multiple times when the maps contain different class types MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13611557#comment-13611557 ] David Parks commented on HADOOP-9295: ------------------------------------- Haha, this gets confusing, you had me thinking I totally muffed up that testcase for a few minutes. The test case, and the reason I created 2 *different* MapWritable objects, serves to emulate what is happening in a hadoop Map/Reduce phase (where I actually ran into the bug). Take this example MapReduce job with 2 mappers: MapperX adds a new MapWritable with: (key -> "testKey1", value -> CustomWritableOne) MapperY adds a new MapWritable with: (key -> "testKey2", value -> CustomWritableTwo) Now the mappers would have done: MapperX: context.write( new Text("commonkey"), mapWritableWithTestKey1 ); MapperY: context.write( new Text("commonkey"), mapWritableWithTestKey2 ); In the reducer we now have an Iterator of MapWritable objects that contain 2 distinct object types that we wanted to join on "commonkey". The use of testKey1 vs. testKey2 as the MapWritable key allows our reducer to identify which data type is contained in this MapWritable. This seems like a quick way to join two differently typed Writable objects for processing in the reducer. The problem happens when we read mapWritableWithTestKey1 in, then use the same object to read mapWritableWithTestKey2 in, which is exactly what the reducer is doing. > AbstractMapWritable throws exception when calling readFields() multiple times when the maps contain different class types > ------------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-9295 > URL: https://issues.apache.org/jira/browse/HADOOP-9295 > Project: Hadoop Common > Issue Type: Bug > Components: io > Affects Versions: 1.0.3 > Reporter: David Parks > Assignee: Karthik Kambatla > Priority: Critical > Attachments: MapWritableBugTest.java, test-hadoop-9295.patch > > > Verified the trunk looks the same as 1.0.3 for this issue. > When mappers output MapWritables with different class types, then they are read in on the Reducer via an iterator (multiple calls to readFields without instantiating a new object) you'll get this: > java.lang.IllegalArgumentException: Id 1 exists but maps to org.me.ClassTypeOne and not org.me.ClassTypeTwo > at org.apache.hadoop.io.AbstractMapWritable.addToMap(AbstractMapWritable.java:73) > at org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.java:201) > It happens because AbstractMapWritable accumulates class type entries in its ClassType to ID (and vice versa) hashmaps. > Those accumulating classtype-to-id hashmaps need to be cleared to support multiple calls to readFields(). > I've attached a JUnit test that both demonstrates the problem and contains an embedded, fixed version of MapWritable and ArrayMapWritable (note the //TODO comments in the code where it was fixed in 2 places). > If there's a better way to submit this recommended bug fix, someone please feel free to let me know. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira