hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Parks (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9295) AbstractMapWritable throws exception when calling readFields() multiple times when the maps contain different class types
Date Sat, 23 Mar 2013 04:15:16 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13611557#comment-13611557
] 

David Parks commented on HADOOP-9295:
-------------------------------------

Haha, this gets confusing, you had me thinking I totally muffed up that testcase for a few
minutes.

The test case, and the reason I created 2 *different* MapWritable objects, serves to emulate
what is happening in a hadoop Map/Reduce phase (where I actually ran into the bug).

Take this example MapReduce job with 2 mappers:

 MapperX adds a new MapWritable with: (key -> "testKey1", value -> CustomWritableOne)
 MapperY adds a new MapWritable with: (key -> "testKey2", value -> CustomWritableTwo)

Now the mappers would have done:

 MapperX: context.write( new Text("commonkey"), mapWritableWithTestKey1 );
 MapperY: context.write( new Text("commonkey"), mapWritableWithTestKey2 );

In the reducer we now have an Iterator of MapWritable objects that contain 2 distinct object
types that we wanted to join on "commonkey". 

The use of testKey1 vs. testKey2 as the MapWritable key allows our reducer to identify which
data type is contained in this MapWritable. This seems like a quick way to join two differently
typed Writable objects for processing in the reducer.

The problem happens when we read mapWritableWithTestKey1 in, then use the same object to read
mapWritableWithTestKey2 in, which is exactly what the reducer is doing.
                
> AbstractMapWritable throws exception when calling readFields() multiple times when the
maps contain different class types
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-9295
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9295
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1.0.3
>            Reporter: David Parks
>            Assignee: Karthik Kambatla
>            Priority: Critical
>         Attachments: MapWritableBugTest.java, test-hadoop-9295.patch
>
>
> Verified the trunk looks the same as 1.0.3 for this issue.
> When mappers output MapWritables with different class types, then they are read in on
the Reducer via an iterator (multiple calls to readFields without instantiating a new object)
you'll get this:
> java.lang.IllegalArgumentException: Id 1 exists but maps to org.me.ClassTypeOne and not
org.me.ClassTypeTwo
>         at org.apache.hadoop.io.AbstractMapWritable.addToMap(AbstractMapWritable.java:73)
>         at org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.java:201)
> It happens because AbstractMapWritable accumulates class type entries in its ClassType
to ID (and vice versa) hashmaps.
> Those accumulating classtype-to-id hashmaps need to be cleared to support multiple calls
to readFields().
> I've attached a JUnit test that both demonstrates the problem and contains an embedded,
fixed version of MapWritable and ArrayMapWritable (note the //TODO comments in the code where
it was fixed in 2 places).
> If there's a better way to submit this recommended bug fix, someone please feel free
to let me know.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message