hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-1347) Missing synchronization in MultipleOutputFormat
Date Fri, 24 Jun 2011 19:07:49 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054631#comment-13054631

jiraposter@reviews.apache.org commented on MAPREDUCE-1347:

This is an automatically generated e-mail. To reply, visit:

Review request for hadoop-mapreduce and Todd Lipcon.


Used the makeComputingMap from Guava's MapMaker to provide a thread safe way of creating a
RecordWriter cache.

For some reason, the map is not really caching it and is instead trying to apply() over and
over again for the same key-value pairs.

This addresses bug MAPREDUCE-1347.


  mapreduce/ivy.xml 85ee014 
  mapreduce/ivy/libraries.properties 9d40aaa 
  mapreduce/src/java/org/apache/hadoop/mapred/lib/MultipleOutputFormat.java b8944f1 
  mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestMultipleTextOutputFormat.java 14c097d

Diff: https://reviews.apache.org/r/953/diff


Added a test case, but it fails with the current behavior of MapMaker's makeComputingMap()
(would pass if its alright)



> Missing synchronization in MultipleOutputFormat
> -----------------------------------------------
>                 Key: MAPREDUCE-1347
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1347
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.2, 0.21.0, 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Harsh J
>         Attachments: MAPREDUCE-1347.r2.diff, MAPREDUCE-1347.r3.diff, mapreduce.1347.r1.diff
> MultipleOutputFormat's RecordWriter implementation doesn't use synchronization when accessing
the recordWriters member. When using multithreaded mappers or reducers, this can result in
problems where two threads will both try to create the same file, causing AlreadyBeingCreatedException.
Doing this more fine-grained than just synchronizing the whole method is probably a good idea,
so that multithreaded mappers can actually achieve parallelism writing into separate output
> From what I can tell, the new API's MultipleOutputs seems not to have this issue.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message