incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CRUNCH-73) Scrunch applications using PipelineApp do not properly serialize closures to MapReduce tasks.
Date Sat, 22 Sep 2012 00:33:08 GMT

     [ https://issues.apache.org/jira/browse/CRUNCH-73?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Josh Wills resolved CRUNCH-73.
------------------------------

       Resolution: Fixed
    Fix Version/s: 0.4.0

Fixed. Thanks Kiyan!
                
> Scrunch applications using PipelineApp do not properly serialize closures to MapReduce
tasks.
> ---------------------------------------------------------------------------------------------
>
>                 Key: CRUNCH-73
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-73
>             Project: Crunch
>          Issue Type: Bug
>          Components: Scrunch
>    Affects Versions: 0.4.0
>            Reporter: Kiyan Ahmadizadeh
>            Assignee: Kiyan Ahmadizadeh
>             Fix For: 0.4.0
>
>         Attachments: CRUNCH-73-v1.patch, CRUNCH-73-v2.patch
>
>
> One of the great potential advantages of using Scala for writing MapReduce pipelines
is the ability to send side data as part of function closures, rather than through Hadoop
Configurations or the Distributed Cache.  As an absurdly simple example, consider the following
Scala PipelineApp that divides all elements of a numeric PCollection by an arbitrary argument:
> object DivideApp extends PipelineApp {
>   val divisor = Integer.valueOf(args(0))
>   val nums = read(From.textFile("numbers.txt"))
>   val dividedNums = nums.map { n => n / divisor }
>   dividedNums.write(To.textFile("dividedNums"))
>   run()
> }
> Executing this PipelineApp fails.  MapReduce tasks get a value of "null" for divisor
(or 0 if divisor is forced to be a primitive numeric type).  This indicates that an error
is occurring in the serialization of Scala function closures that causes unbound variables
in the closure to take on their default JVM values.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message