crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mikael Goldmann (JIRA)" <>
Subject [jira] [Created] (CRUNCH-655) DefaultJoinStrategy full outer join failing for spark pipeline
Date Thu, 17 Aug 2017 14:00:10 GMT
Mikael Goldmann created CRUNCH-655:

             Summary: DefaultJoinStrategy full outer join failing for spark pipeline
                 Key: CRUNCH-655
             Project: Crunch
          Issue Type: Bug
          Components: Spark
         Environment: Mac OSX, crunch 0.13.0 and 0.15.0 (with reproduction code), Ubuntu 14.04
(repro code not tried, but similar issue in production with 0.13.0)
            Reporter: Mikael Goldmann

When the left and right table in the the join have entries with the same key, they do not
alway end up together. Cannot reproduce when running the join with a single reducer, and happens
more commonly if there are many reducers and very few copies of each key to the left and right.

My guess is that it sometimes happens that the left value for key k ends up on a different
reducer from the right value with key k.

With my production issue, it went away if I either used a single reducer or used cogroup instead.

I've attached a class to reproduce the issue.

This message was sent by Atlassian JIRA

View raw message