kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Missing 'com.cloudera.kudu.hive.KuduStorageHandler'
Date Tue, 10 Jan 2017 05:26:30 GMT
On Mon, Jan 9, 2017 at 2:54 AM, Frank Heimerzheim <fh.ordix@gmail.com>
wrote:

> Hello Todd,
>
> one additional question:
>
> There exists a KuduContext in org.apache.kudu.spark.kudu._ which provides
> read/write/update to be used with scala and spark. I´m now looking fo a
> similar solution for python and spark. I´ve found
> https://github.com/bkvarda/iot_demo which looks fine on a first look. But
> i would much more prever an "official"  solution. Is there anything to be
> expected in the near future? Or a way - i don´t know yet - to use the scala
> library from python?
>

I'm not a real Spark expert (especially not pyspark) so I don't have a
great answer to this question. The github demo you linked above looks like
a reasonable approach, though.

Jordan Birdsell is our primary Python expert, and he filed
https://issues.apache.org/jira/browse/KUDU-1603 a while back. Hopefully he
will chime in with a better answer than I can give :)

-Todd

2016-12-13 16:05 GMT+01:00 Frank Heimerzheim <fh.ordix@gmail.com>:
>
>> Hello Todd,
>>
>> thanks a lot for the clarification.
>>
>> Greetings
>> Frank
>>
>> 2016-12-13 15:36 GMT+01:00 Todd Lipcon <todd@cloudera.com>:
>>
>>> Hi Frank,
>>>
>>> I'm sorry to say that the Java storage handler implementation you're
>>> looking for doesn't exist. The Hive metastore requires that non-HDFS
>>> storage engines set some value for the 'storage handler' property, so
>>> Impala uses that special string to denote a Kudu table in the HMS. However,
>>> there is no such Java implementation- Impala detects this class name and
>>> uses its own implementation to plan and execute queries against Kudu.
>>>
>>> The Hive support for Kudu is tracked here: https://issues.apache.or
>>> g/jira/browse/HIVE-12971
>>> This work isn't committed to the Hive project but there is a prototype
>>> on github that you could try. Note that it's not being actively developed
>>> by the Kudu dev community at this point in time, but if you get it working,
>>> please report back with your experiences.
>>>
>>> Thanks
>>> -Todd
>>>
>>> On Tue, Dec 13, 2016 at 6:12 PM, Frank Heimerzheim <fh.ordix@gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> within the impala-shell i can create an external table and thereafter
>>>> select and insert data from an underlying kudu table. Within the statement
>>>> for creation of the table an 'StorageHandler' will be set to
>>>>  'com.cloudera.kudu.hive.KuduStorageHandler'. Everything works fine as
>>>> there exists apparently an *.jar with the referenced library within.
>>>>
>>>> When trying to select from a hive-shell there is an error that the
>>>> handler is not available. Trying to 'rdd.collect()' from an hiveCtx within
>>>> an sparkSession i also get an error JavaClassNotFoundException as
>>>> the KuduStorageHandler is not available.
>>>>
>>>> I then tried to find a jar in my system with the intention to copy it
>>>> to all my data nodes. Sadly i couldn´t find the specific jar. I think it
>>>> exists in the system as impala apparently is using it. For a test i´ve
>>>> changed the 'StorageHandler' in the creation statement to
>>>> 'com.cloudera.kudu.hive.KuduStorageHandler_foo'. The create statement
>>>> worked. Also the select from impala, but i didin´t return any data. There
>>>> was no error as i expected. The test was just for the case impala would in
>>>> a magic way select data from kudu without an correct 'StorageHandler'.
>>>> Apparently this is not the case and impala has access to an
>>>>  'com.cloudera.kudu.hive.KuduStorageHandler'.
>>>>
>>>> Long story, short question:
>>>> In which *.jar i can find the  'com.cloudera.kudu.hive.KuduS
>>>> torageHandler'?
>>>> Is the approach to copy the jar per hand to all nodes an appropriate
>>>> way to bring spark in a position to work with kudu?
>>>> What about the beeline-shell from hive and the possibility to read from
>>>> kudu?
>>>>
>>>> My Environment: Cloudera 5.7 with kudu and impala-kudu from installed
>>>> parcels. Build a working python-kudu library successfully from scratch (git)
>>>>
>>>> Thanks a lot!
>>>> Frank
>>>>
>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>
>>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
View raw message