kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Spark on Kudu
Date Tue, 01 Mar 2016 20:20:44 GMT
Hi Ben,

AFAIK no one in the dev community committed to any timeline. I know of one
person on the Kudu Slack who's working on a better RDD, but that's about it.

Regards,

J-D

On Tue, Mar 1, 2016 at 11:00 AM, Benjamin Kim <bkim@amobee.com> wrote:

> Hi J-D,
>
> Quick question… Is there an ETA for KUDU-1214? I want to target a version
> of Kudu to begin real testing of Spark against it for our devs. At least, I
> can tell them what timeframe to anticipate.
>
> Just curious,
> *Benjamin Kim*
> *Data Solutions Architect*
>
> [a•mo•bee] *(n.)* the company defining digital marketing.
>
> *Mobile: +1 818 635 2900 <%2B1%20818%20635%202900>*
> 3250 Ocean Park Blvd, Suite 200  |  Santa Monica, CA 90405  |
> www.amobee.com
>
> On Feb 24, 2016, at 3:51 PM, Jean-Daniel Cryans <jdcryans@apache.org>
> wrote:
>
> The DStream stuff isn't there at all. I'm not sure if it's needed either.
>
> The kuduRDD is just leveraging the MR input format, ideally we'd use scans
> directly.
>
> The SparkSQL stuff is there but it doesn't do any sort of pushdown. It's
> really basic.
>
> The goal was to provide something for others to contribute to. We have
> some basic unit tests that others can easily extend. None of us on the team
> are Spark experts, but we'd be really happy to assist one improve the
> kudu-spark code.
>
> J-D
>
> On Wed, Feb 24, 2016 at 3:41 PM, Benjamin Kim <bbuild11@gmail.com> wrote:
>
>> J-D,
>>
>> It looks like it fulfills most of the basic requirements (kudu RDD, kudu
>> DStream) in KUDU-1214. Am I right? Besides shoring up more Spark SQL
>> functionality (Dataframes) and doing the documentation, what more needs to
>> be done? Optimizations?
>>
>> I believe that it’s a good place to start using Spark with Kudu and
>> compare it to HBase with Spark (not clean).
>>
>> Thanks,
>> Ben
>>
>>
>> On Feb 24, 2016, at 3:10 PM, Jean-Daniel Cryans <jdcryans@apache.org>
>> wrote:
>>
>> AFAIK no one is working on it, but we did manage to get this in for
>> 0.7.0: https://issues.cloudera.org/browse/KUDU-1321
>>
>> It's a really simple wrapper, and yes you can use SparkSQL on Kudu, but
>> it will require a lot more work to make it fast/useful.
>>
>> Hope this helps,
>>
>> J-D
>>
>> On Wed, Feb 24, 2016 at 3:08 PM, Benjamin Kim <bbuild11@gmail.com> wrote:
>>
>>> I see this KUDU-1214 <https://issues.cloudera.org/browse/KUDU-1214> targeted
>>> for 0.8.0, but I see no progress on it. When this is complete, will this
>>> mean that Spark will be able to work with Kudu both programmatically and as
>>> a client via Spark SQL? Or is there more work that needs to be done on the
>>> Spark side for it to work?
>>>
>>> Just curious.
>>>
>>> Cheers,
>>> Ben
>>>
>>>
>>
>>
>
>

Mime
View raw message