Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A8D01200C04 for ; Tue, 10 Jan 2017 06:26:57 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id A3228160B49; Tue, 10 Jan 2017 05:26:57 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9F6FE160B3E for ; Tue, 10 Jan 2017 06:26:56 +0100 (CET) Received: (qmail 50970 invoked by uid 500); 10 Jan 2017 05:26:55 -0000 Mailing-List: contact user-help@kudu.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@kudu.apache.org Delivered-To: mailing list user@kudu.apache.org Received: (qmail 50962 invoked by uid 99); 10 Jan 2017 05:26:55 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jan 2017 05:26:55 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 282431A05F4 for ; Tue, 10 Jan 2017 05:26:55 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.48 X-Spam-Level: ** X-Spam-Status: No, score=2.48 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=cloudera-com.20150623.gappssmtp.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id b017iybeiDYp for ; Tue, 10 Jan 2017 05:26:52 +0000 (UTC) Received: from mail-wj0-f177.google.com (mail-wj0-f177.google.com [209.85.210.177]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 4A4B65FBFE for ; Tue, 10 Jan 2017 05:26:52 +0000 (UTC) Received: by mail-wj0-f177.google.com with SMTP id i20so76052705wjn.2 for ; Mon, 09 Jan 2017 21:26:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudera-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=GeBRjp2WRiCyHNezNHl6ejo/sOJCaTxs0k/3Hh+UuU4=; b=OEH+Yuu6Q2zYFWAULm5ltlFhA9mN+JKNyDwP8q8V6LnzkPvPXHU6P1r9q+ymH/xYob MheZlbZ1F7ua1JHk08HKLFUWZRyxYxu1wOyPLdtI88AIZ0gBFQJ6Otsxy2IOnu7VYRQP J5bTU9YsXBYUZb3qqxo7hORFcK9mT9vJJc5bR4/aZzyu3XjUjYoOkCSDIsl0hHMJ644m cy5/MQq3a6Cz3sF7X30OUL0MEfaxMpKiQHyZN79iGnjLfHfj0aVUuVUcERxvLBWiHpBm 3eiCk5uA8RQGM4iGEyYZhxw9NYhGbCBeHtuOexZ3kh3Kaqkxfgj8E+lp41GaLTE3a3h9 500Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=GeBRjp2WRiCyHNezNHl6ejo/sOJCaTxs0k/3Hh+UuU4=; b=HE0uCFF9H/3L0N5Vv8wqg+MDJxN2eibVJAj0n8B+4lc9xch3QLy1O7seO9XyLDF9pf 0WRsya4GYYIcCrk7RL0knbXfDNSNY/0rO7E1ZhuHVLeHeqOd5Y4J0feEWcLvPVo0B3yT 2OMBcuu8GSq0ogOGzWIGOJFANGyVcioOfisv1lUcTq/pxDi9Aa1UxFlC1odvonmLp7+q IPZh0MMW1/b7bznav207uigSDutWRSUgtEwT0XMS9T++J8U9ma2iZ2WdgxUMDgTsFzyt 1IZT9RubEvpSIYlh6F9w117cTJEpqCYHZEfX6rda5zllP4IJ1zBszSp99gBp8wEZC8iT wRMQ== X-Gm-Message-State: AIkVDXIS6ebEZlal9wej9nNHA+PRgDPOXTGjvsBlVFrxVS+/qVaYvKcT+VWzu2q5TI+v7jpAX7j0U6mBH48Z2S0e X-Received: by 10.194.26.133 with SMTP id l5mr619057wjg.4.1484026011494; Mon, 09 Jan 2017 21:26:51 -0800 (PST) MIME-Version: 1.0 Received: by 10.28.135.68 with HTTP; Mon, 9 Jan 2017 21:26:30 -0800 (PST) In-Reply-To: References: From: Todd Lipcon Date: Mon, 9 Jan 2017 21:26:30 -0800 Message-ID: Subject: Re: Missing 'com.cloudera.kudu.hive.KuduStorageHandler' To: user@kudu.apache.org Content-Type: multipart/alternative; boundary=089e0160a4d640ca7a0545b6bbbc archived-at: Tue, 10 Jan 2017 05:26:57 -0000 --089e0160a4d640ca7a0545b6bbbc Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Mon, Jan 9, 2017 at 2:54 AM, Frank Heimerzheim wrote: > Hello Todd, > > one additional question: > > There exists a KuduContext in org.apache.kudu.spark.kudu._ which provides > read/write/update to be used with scala and spark. I=C2=B4m now looking f= o a > similar solution for python and spark. I=C2=B4ve found > https://github.com/bkvarda/iot_demo which looks fine on a first look. But > i would much more prever an "official" solution. Is there anything to be > expected in the near future? Or a way - i don=C2=B4t know yet - to use th= e scala > library from python? > I'm not a real Spark expert (especially not pyspark) so I don't have a great answer to this question. The github demo you linked above looks like a reasonable approach, though. Jordan Birdsell is our primary Python expert, and he filed https://issues.apache.org/jira/browse/KUDU-1603 a while back. Hopefully he will chime in with a better answer than I can give :) -Todd 2016-12-13 16:05 GMT+01:00 Frank Heimerzheim : > >> Hello Todd, >> >> thanks a lot for the clarification. >> >> Greetings >> Frank >> >> 2016-12-13 15:36 GMT+01:00 Todd Lipcon : >> >>> Hi Frank, >>> >>> I'm sorry to say that the Java storage handler implementation you're >>> looking for doesn't exist. The Hive metastore requires that non-HDFS >>> storage engines set some value for the 'storage handler' property, so >>> Impala uses that special string to denote a Kudu table in the HMS. Howe= ver, >>> there is no such Java implementation- Impala detects this class name an= d >>> uses its own implementation to plan and execute queries against Kudu. >>> >>> The Hive support for Kudu is tracked here: https://issues.apache.or >>> g/jira/browse/HIVE-12971 >>> This work isn't committed to the Hive project but there is a prototype >>> on github that you could try. Note that it's not being actively develop= ed >>> by the Kudu dev community at this point in time, but if you get it work= ing, >>> please report back with your experiences. >>> >>> Thanks >>> -Todd >>> >>> On Tue, Dec 13, 2016 at 6:12 PM, Frank Heimerzheim >>> wrote: >>> >>>> Hello, >>>> >>>> within the impala-shell i can create an external table and thereafter >>>> select and insert data from an underlying kudu table. Within the state= ment >>>> for creation of the table an 'StorageHandler' will be set to >>>> 'com.cloudera.kudu.hive.KuduStorageHandler'. Everything works fine as >>>> there exists apparently an *.jar with the referenced library within. >>>> >>>> When trying to select from a hive-shell there is an error that the >>>> handler is not available. Trying to 'rdd.collect()' from an hiveCtx wi= thin >>>> an sparkSession i also get an error JavaClassNotFoundException as >>>> the KuduStorageHandler is not available. >>>> >>>> I then tried to find a jar in my system with the intention to copy it >>>> to all my data nodes. Sadly i couldn=C2=B4t find the specific jar. I t= hink it >>>> exists in the system as impala apparently is using it. For a test i=C2= =B4ve >>>> changed the 'StorageHandler' in the creation statement to >>>> 'com.cloudera.kudu.hive.KuduStorageHandler_foo'. The create statement >>>> worked. Also the select from impala, but i didin=C2=B4t return any dat= a. There >>>> was no error as i expected. The test was just for the case impala woul= d in >>>> a magic way select data from kudu without an correct 'StorageHandler'. >>>> Apparently this is not the case and impala has access to an >>>> 'com.cloudera.kudu.hive.KuduStorageHandler'. >>>> >>>> Long story, short question: >>>> In which *.jar i can find the 'com.cloudera.kudu.hive.KuduS >>>> torageHandler'? >>>> Is the approach to copy the jar per hand to all nodes an appropriate >>>> way to bring spark in a position to work with kudu? >>>> What about the beeline-shell from hive and the possibility to read fro= m >>>> kudu? >>>> >>>> My Environment: Cloudera 5.7 with kudu and impala-kudu from installed >>>> parcels. Build a working python-kudu library successfully from scratch= (git) >>>> >>>> Thanks a lot! >>>> Frank >>>> >>> >>> >>> >>> -- >>> Todd Lipcon >>> Software Engineer, Cloudera >>> >> >> > --=20 Todd Lipcon Software Engineer, Cloudera --089e0160a4d640ca7a0545b6bbbc Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
On M= on, Jan 9, 2017 at 2:54 AM, Frank Heimerzheim <fh.ordix@gmail.com>= wrote:
Hello Todd,

one additional question:

There exists a=C2=A0KuduContext in org.apache.kudu.spark.kudu._ w= hich provides read/write/update to be used with scala and spark. I=C2=B4m n= ow looking fo a similar solution for python and spark. I=C2=B4ve found https://githu= b.com/bkvarda/iot_demo which looks fine on a first look. But i wou= ld much more prever an "official"=C2=A0 solution. Is there anythi= ng to be expected in the near future? Or a way - i don=C2=B4t know yet - to= use the scala library from python?

=
I'm not a real Spark expert (especially not pyspark) so I don'= t have a great answer to this question. The github demo you linked above lo= oks like a reasonable approach, though.

Jordan Bir= dsell is our primary Python expert, and he filed=C2=A0https://issues.apache.org/jira/brows= e/KUDU-1603 a while back. Hopefully he will chime in with a better answ= er than I can give :)

-Todd

2016-12-13 16:05 GMT+01:00 Frank Heimerzheim <fh.ordix@gmail.com&= gt;:
Hello Todd,

thanks a lot for the clarification.= =C2=A0

Greetings
Frank

2016-12-13 15:36 GMT+01:00 Todd Lipcon <todd@cloudera.com&= gt;:
Hi Frank,

I'm sorry to say that the Java st= orage handler implementation you're looking for doesn't exist. The = Hive metastore requires that non-HDFS storage engines set some value for th= e 'storage handler' property, so Impala uses that special string to= denote a Kudu table in the HMS. However, there is no such Java implementat= ion- Impala detects this class name and uses its own implementation to plan= and execute queries against Kudu.

The Hive suppor= t for Kudu is tracked here:=C2=A0https://issues.apache.org/jira/br= owse/HIVE-12971
This work isn't committed to the Hive pro= ject but there is a prototype on github that you could try. Note that it= 9;s not being actively developed by the Kudu dev community at this point in= time, but if you get it working, please report back with your experiences.=

Thanks
-Todd

On Tue, Dec 13, 2016 at 6:12 PM, F= rank Heimerzheim <fh.ordix@gmail.com> wrote:
Hello,

within the impala-shell i can create an external table and thereafte= r select and insert data from an underlying kudu table. Within the statemen= t for creation of the table an 'StorageHandler' will be set to =C2= =A0'com.cloudera.kudu.hive.KuduStorageHandler'. Everything wor= ks fine as there exists apparently an *.jar with the referenced library wit= hin.

When trying to select from a hive-shell there= is an error that the handler is not available. Trying to 'rdd.collect(= )' from an hiveCtx within an sparkSession i also get an error JavaClass= NotFoundException as the=C2=A0KuduStorageHandler is not available.

I then tried to find a jar in my system with the intention= to copy it to all my data nodes. Sadly i couldn=C2=B4t find the specific j= ar. I think it exists in the system as impala apparently is using it. For a= test i=C2=B4ve changed the 'StorageHandler' in the creation statem= ent to 'com.cloudera.kudu.hive.KuduStorageHandler_foo'. The cr= eate statement worked. Also the select from impala, but i didin=C2=B4t retu= rn any data. There was no error as i expected. The test was just for the ca= se impala would in a magic way select data from kudu without an correct = 9;StorageHandler'. Apparently this is not the case and impala has acces= s to an =C2=A0'com.cloudera.kudu.hive.KuduStorageHandler'.

Long story, short question:
In which *.jar = i can find the =C2=A0'com.cloudera.kudu.hive.KuduStorageHandler= 9;?
Is the approach to copy the jar per hand to all nodes an appr= opriate way to bring spark in a position to work with kudu?
What = about the beeline-shell from hive and the possibility to read from kudu?

My Environment: Cloudera 5.7 with kudu and impala-ku= du from installed parcels. Build a working python-kudu library successfully= from scratch (git)

Thanks a lot!
Frank



--
Todd Lipcon
Software En= gineer, Cloudera





--
=
Todd Lipcon
Software Engineer, Cloudera
--089e0160a4d640ca7a0545b6bbbc--