crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Baugher <bjb...@gmail.com>
Subject Re: Joins and null values
Date Wed, 18 Feb 2015 22:41:02 GMT
Maybe,

PCollection<T>#join(PCollection<T>, JoinType) : PCollection<Pair<T, T>>

You could make additional methods for the different join strategies or
maybe an enum perhaps?

On Wed Feb 18 2015 at 3:58:38 PM Josh Wills <jwills@cloudera.com> wrote:

> Hey Bryan,
>
> I like the idea of throwing exceptions when there are null values in one
> of the collections in a join. Not sure if there are any other implications
> of that I should think through first.
>
> On the convenience methods for PCollection joins, what do you have in mind?
>
> J
>
>
> On Wed, Feb 18, 2015 at 12:35 PM, Bryan Baugher <bjbq4d@gmail.com> wrote:
>
>> Hi everyone,
>>
>> The other day I ran into the issue mentioned here[1] about joining data
>> with null values. This took awhile to figure out until I broke down and
>> went to look at the docs to see if I was doing something obviously wrong. I
>> used null values because I'm basically wanting to join two pcollections.
>>
>> Can crunch either throw an exception or log errors if I do something like
>> this? Similarly would it be possible to get convenience methods for doing
>> joins on PCollections?
>>
>> [1] - http://crunch.apache.org/user-guide.html#joins
>>
>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>

Mime
View raw message