crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-90) Object reuse is not accounted for in mapper fusion
Date Sun, 07 Oct 2012 21:46:03 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471333#comment-13471333
] 

Gabriel Reid commented on CRUNCH-90:
------------------------------------

I've just discovered that this patch breaks the PageRankClassTest in scrunch. Unfortunately,
I only ran the crunch integration tests (instead of the full suite) before committing it.
The exception being thrown is as follows:

java.lang.IllegalArgumentException: Can not set final [Ljava.lang.String; field org.apache.crunch.scrunch.PageRankData.urls
to org.apache.avro.generic.GenericData$Array

It appears that the deep copying is running into an issue with reading in a PageRankData object
using reflection-based serialization. I'm not sure why this is only coming out now, as I would
have thought that existing serialization logic would have caused an issue with it without
even using deep copying. This appears to be a bug in Avro (similar to AVRO-1046).

[~jwills] I'm totally clueless when it comes to Scala -- any chance you could take a look
and possibly try changing the type of the urls field to something that Avro can deal with,
or give me any pointers on what you think might be going on here? I definitely want to report
this to the Avro JIRA as well if it is indeed an Avro bug, but it would be good to work around
it for now.
                
> Object reuse is not accounted for in mapper fusion
> --------------------------------------------------
>
>                 Key: CRUNCH-90
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-90
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Gabriel Reid
>            Assignee: Gabriel Reid
>             Fix For: 0.4.0
>
>         Attachments: CRUNCH-90.patch
>
>
> When multiple DoFns are run over the same output (i.e. in the case of mapper fusion),
the same value object is passed to multiple underlying DoFns. If the state of that value object
is changed by one DoFn, other DoFns are called with the updated object.
> This is a situation that can happen quite easily when the input of a DoFn is simply updated
and then emitted. In general, this bug will only affect values whose type is the same as the
underlying serialization type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message