crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <josh.wi...@gmail.com>
Subject Re: JoinStrategy with Spark pipeline
Date Wed, 02 Sep 2015 22:56:15 GMT
Posted a patch to fix this here:
https://issues.apache.org/jira/browse/CRUNCH-557

On Wed, Sep 2, 2015 at 2:17 PM, Surbhi Mungre <mungre.surbhi@gmail.com>
wrote:

> I was trying to determine effect of changing JoinStrategy on a Spark
> pipeline. I noticed that my pipeline works fine with DefaultJoinStrategy,
> however I could not get it to working with MapSideJoinStrategy and
> BloomFilterJoinStrategy. For MapSideJoinStrategy I get an exceptions[1] on
> driver itself and for BloomFilterJoinStrategy I get exceptions[2] in one of
> the stages. I have not tried to do any configuration changes but I did run
> tests with datasets of different sizes to ensure that my PCollection is
> small enough to fit in memory. I am running spark in yarn-client mode with
> Crunch 0.11.0-cdh5.4.2.
>
> [1] https://gist.github.com/anonymous/15d6c691b743ad392d42
> [2] https://gist.github.com/anonymous/b02a82401a30a69f1cff
>
> Thanks,
> Surbhi
>

Mime
View raw message