crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Surbhi Mungre <>
Subject JoinStrategy with Spark pipeline
Date Wed, 02 Sep 2015 21:17:54 GMT
I was trying to determine effect of changing JoinStrategy on a Spark
pipeline. I noticed that my pipeline works fine with DefaultJoinStrategy,
however I could not get it to working with MapSideJoinStrategy and
BloomFilterJoinStrategy. For MapSideJoinStrategy I get an exceptions[1] on
driver itself and for BloomFilterJoinStrategy I get exceptions[2] in one of
the stages. I have not tried to do any configuration changes but I did run
tests with datasets of different sizes to ensure that my PCollection is
small enough to fit in memory. I am running spark in yarn-client mode with
Crunch 0.11.0-cdh5.4.2.



View raw message