kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Missing 'com.cloudera.kudu.hive.KuduStorageHandler'
Date Mon, 13 Feb 2017 05:23:13 GMT
On Tue, Feb 7, 2017 at 6:17 AM, Frank Heimerzheim <fh.ordix@gmail.com>

> Hello,
> quite a while i´ve worked successfully with https://maven2repo.com/org.
> apache.kudu/kudu-spark_2.10/1.2.0/jar
> For a bit i ignored a problem with kudu datatype int8. With the connector
> i can´t write int8 as int in python will always bring up errors like
> "java.lang.IllegalArgumentException: id isn´t [Type: int64, size: 8, Tye:
> unixtime_micros, size: 8], it´s int8"
> As python isn´t hard typed the connector is trying to find a suitable type
> for python int in java/kudu. Apparently the python int is matched to
> int64/unixtime_micros and not int8 as kudu is expecting at this place.
> As a quick solution all my int in kudu are int64 at the moment
> In the long run i can´t accept this waste of hdd space or even worse I/O.
> Any idea when i can store int8 from python/spark to kudu?
> With the "normal" python api everything works fine, only the spark/kudu/python
> connector brings up the problem.

Not 100% sure I'm following. You're using pyspark here? Can you post a bit
of sample code that reproduces the issue?


> 2016-12-13 12:12 GMT+01:00 Frank Heimerzheim <fh.ordix@gmail.com>:
>> Hello,
>> within the impala-shell i can create an external table and thereafter
>> select and insert data from an underlying kudu table. Within the statement
>> for creation of the table an 'StorageHandler' will be set to
>>  'com.cloudera.kudu.hive.KuduStorageHandler'. Everything works fine as
>> there exists apparently an *.jar with the referenced library within.
>> When trying to select from a hive-shell there is an error that the
>> handler is not available. Trying to 'rdd.collect()' from an hiveCtx within
>> an sparkSession i also get an error JavaClassNotFoundException as
>> the KuduStorageHandler is not available.
>> I then tried to find a jar in my system with the intention to copy it to
>> all my data nodes. Sadly i couldn´t find the specific jar. I think it
>> exists in the system as impala apparently is using it. For a test i´ve
>> changed the 'StorageHandler' in the creation statement to
>> 'com.cloudera.kudu.hive.KuduStorageHandler_foo'. The create statement
>> worked. Also the select from impala, but i didin´t return any data. There
>> was no error as i expected. The test was just for the case impala would in
>> a magic way select data from kudu without an correct 'StorageHandler'.
>> Apparently this is not the case and impala has access to an
>>  'com.cloudera.kudu.hive.KuduStorageHandler'.
>> Long story, short question:
>> In which *.jar i can find the  'com.cloudera.kudu.hive.KuduS
>> torageHandler'?
>> Is the approach to copy the jar per hand to all nodes an appropriate way
>> to bring spark in a position to work with kudu?
>> What about the beeline-shell from hive and the possibility to read from
>> kudu?
>> My Environment: Cloudera 5.7 with kudu and impala-kudu from installed
>> parcels. Build a working python-kudu library successfully from scratch (git)
>> Thanks a lot!
>> Frank

Todd Lipcon
Software Engineer, Cloudera

View raw message