kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Missing 'com.cloudera.kudu.hive.KuduStorageHandler'
Date Tue, 13 Dec 2016 14:36:25 GMT
Hi Frank,

I'm sorry to say that the Java storage handler implementation you're
looking for doesn't exist. The Hive metastore requires that non-HDFS
storage engines set some value for the 'storage handler' property, so
Impala uses that special string to denote a Kudu table in the HMS. However,
there is no such Java implementation- Impala detects this class name and
uses its own implementation to plan and execute queries against Kudu.

The Hive support for Kudu is tracked here:
This work isn't committed to the Hive project but there is a prototype on
github that you could try. Note that it's not being actively developed by
the Kudu dev community at this point in time, but if you get it working,
please report back with your experiences.


On Tue, Dec 13, 2016 at 6:12 PM, Frank Heimerzheim <fh.ordix@gmail.com>

> Hello,
> within the impala-shell i can create an external table and thereafter
> select and insert data from an underlying kudu table. Within the statement
> for creation of the table an 'StorageHandler' will be set to
>  'com.cloudera.kudu.hive.KuduStorageHandler'. Everything works fine as
> there exists apparently an *.jar with the referenced library within.
> When trying to select from a hive-shell there is an error that the handler
> is not available. Trying to 'rdd.collect()' from an hiveCtx within an
> sparkSession i also get an error JavaClassNotFoundException as
> the KuduStorageHandler is not available.
> I then tried to find a jar in my system with the intention to copy it to
> all my data nodes. Sadly i couldn´t find the specific jar. I think it
> exists in the system as impala apparently is using it. For a test i´ve
> changed the 'StorageHandler' in the creation statement to
> 'com.cloudera.kudu.hive.KuduStorageHandler_foo'. The create statement
> worked. Also the select from impala, but i didin´t return any data. There
> was no error as i expected. The test was just for the case impala would in
> a magic way select data from kudu without an correct 'StorageHandler'.
> Apparently this is not the case and impala has access to an
>  'com.cloudera.kudu.hive.KuduStorageHandler'.
> Long story, short question:
> In which *.jar i can find the  'com.cloudera.kudu.hive.
> KuduStorageHandler'?
> Is the approach to copy the jar per hand to all nodes an appropriate way
> to bring spark in a position to work with kudu?
> What about the beeline-shell from hive and the possibility to read from
> kudu?
> My Environment: Cloudera 5.7 with kudu and impala-kudu from installed
> parcels. Build a working python-kudu library successfully from scratch (git)
> Thanks a lot!
> Frank

Todd Lipcon
Software Engineer, Cloudera

View raw message