crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Joins and null values
Date Wed, 18 Feb 2015 22:54:43 GMT
If I got that right, then I think o.a.c.lib.Set does what you want. LMK.

On Wed, Feb 18, 2015 at 2:53 PM, Josh Wills <jwills@cloudera.com> wrote:

> Oh, I'm dumb-- you mean you want like a left-join like thing where you can
> find all values in collection A that aren't in collection B, etc., etc.?
>
> J
>
> On Wed, Feb 18, 2015 at 2:43 PM, Josh Wills <jwills@cloudera.com> wrote:
>
>> Different from o.a.c.lib.Cartesian.cross(PCollection<U> left,
>> PCollection<T> right, int parallelism) in some way?
>>
>> J
>>
>> On Wed, Feb 18, 2015 at 2:41 PM, Bryan Baugher <bjbq4d@gmail.com> wrote:
>>
>>>
>>> Maybe,
>>>
>>> PCollection<T>#join(PCollection<T>, JoinType) : PCollection<Pair<T,
T>>
>>>
>>> You could make additional methods for the different join strategies or
>>> maybe an enum perhaps?
>>>
>>> On Wed Feb 18 2015 at 3:58:38 PM Josh Wills <jwills@cloudera.com> wrote:
>>>
>>>> Hey Bryan,
>>>>
>>>> I like the idea of throwing exceptions when there are null values in
>>>> one of the collections in a join. Not sure if there are any other
>>>> implications of that I should think through first.
>>>>
>>>> On the convenience methods for PCollection joins, what do you have in
>>>> mind?
>>>>
>>>> J
>>>>
>>>>
>>>> On Wed, Feb 18, 2015 at 12:35 PM, Bryan Baugher <bjbq4d@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> The other day I ran into the issue mentioned here[1] about joining
>>>>> data with null values. This took awhile to figure out until I broke down
>>>>> and went to look at the docs to see if I was doing something obviously
>>>>> wrong. I used null values because I'm basically wanting to join two
>>>>> pcollections.
>>>>>
>>>>> Can crunch either throw an exception or log errors if I do something
>>>>> like this? Similarly would it be possible to get convenience methods
for
>>>>> doing joins on PCollections?
>>>>>
>>>>> [1] - http://crunch.apache.org/user-guide.html#joins
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Director of Data Science
>>>> Cloudera <http://www.cloudera.com>
>>>> Twitter: @josh_wills <http://twitter.com/josh_wills>
>>>>
>>>
>>
>>
>> --
>> Director of Data Science
>> Cloudera <http://www.cloudera.com>
>> Twitter: @josh_wills <http://twitter.com/josh_wills>
>>
>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message