spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ian O'Connell" <...@ianoconnell.com>
Subject Re: .intersection() method on RDDs?
Date Fri, 24 Jan 2014 05:26:31 GMT
Is there any separation in the API between functions that can be built
solely on the existing exposed public API and ones which require access to
internals?

Just to maybe avoid bloat for composite functions like this that are for
user convenience?

(Ala something like lua's aux api vs core api?)


On Thu, Jan 23, 2014 at 8:33 PM, Matei Zaharia <matei.zaharia@gmail.com>wrote:

> I’d be happy to see this added to the core API.
>
> Matei
>
> On Jan 23, 2014, at 5:39 PM, Andrew Ash <andrew@andrewash.com> wrote:
>
> Ah right of course -- perils of typing code without running it!
>
> It feels like this is a pretty core operation that should be added to the
> main RDD API.  Do other people not run into this often?
>
> When I'm validating a foreign key join in my cluster I often check to make
> sure that the foreign keys land on valid values on the referenced table,
> and the way I do that is checking to see what percentage of the references
> actually land.
>
>
> On Thu, Jan 23, 2014 at 6:36 PM, Evan R. Sparks <evan.sparks@gmail.com>wrote:
>
>> Yup (well, with _._1 at the end!)
>>
>>
>> On Thu, Jan 23, 2014 at 5:28 PM, Andrew Ash <andrew@andrewash.com> wrote:
>>
>>> You're thinking like this?
>>>
>>> A.map(v => (v,None)).join(B.map(v => (v,None))).map(_._2)
>>>
>>>
>>> On Thu, Jan 23, 2014 at 6:26 PM, Evan R. Sparks <evan.sparks@gmail.com>wrote:
>>>
>>>> You could map each to an RDD[(String,None)] and do a join.
>>>>
>>>>
>>>> On Thu, Jan 23, 2014 at 5:18 PM, Andrew Ash <andrew@andrewash.com>wrote:
>>>>
>>>>> Hi spark users,
>>>>>
>>>>> I recently wanted to calculate the set intersection of two RDDs of
>>>>> Strings.  I couldn't find a .intersection() method in the autocomplete
or
>>>>> in the Scala API docs, so used a little set theory to end up with this:
>>>>>
>>>>> lazy val A = ...
>>>>> lazy val B = ...
>>>>> A.union(B).subtract(A.subtract(B)).subtract(B.subtract(A))
>>>>>
>>>>> Which feels very cumbersome.
>>>>>
>>>>> Does anyone have a more idiomatic way to calculate intersection?
>>>>>
>>>>> Thanks!
>>>>> Andrew
>>>>>
>>>>
>>>>
>>>
>>
>
>

Mime
View raw message