crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-350) Non-serializable BloomFilter field in BloomFilterJoinStrategy should be marked transient
Date Fri, 21 Feb 2014 08:06:20 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908080#comment-13908080
] 

Gabriel Reid commented on CRUNCH-350:
-------------------------------------

[~jwills] Any idea how this bug was getting triggered? The BloomFilter in question is only
created in the initialize() method, so the occurrence of this bug means that initialize()
is being called more than once on the same DoFn. The DoFn in question is created within the
BloomFilterJoinStrategy#join method, so there's no way that it could be reused by some other
code as far as I can see.

The BloomFilter should definitely be transient, but it feels to me like this could be a sign
of another issue that should be looked into (or it just means that I'm confused about something).
I think that there are some other places in the code where non-serializable fields may not
be marked as transient with the assumption that initialize will only be called once, and I
have the feeling that initialize probably shouldn't be called multiple times on the same DoFn.

> Non-serializable BloomFilter field in BloomFilterJoinStrategy should be marked transient
> ----------------------------------------------------------------------------------------
>
>                 Key: CRUNCH-350
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-350
>             Project: Crunch
>          Issue Type: Bug
>          Components: MapReduce Patterns
>    Affects Versions: 0.9.0, 0.8.2
>            Reporter: Josh Wills
>             Fix For: 0.10.0, 0.8.3
>
>         Attachments: CRUNCH-350.patch
>
>
> Got a notice from the user mailing list that the BloomFilterJoinStrategy was throwing
a NotSerializableException. I took a look at the code and noticed a DoFn field that should
be marked as transient.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message