Mailing-List: contact user-help@kudu.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@kudu.apache.org
MIME-Version: 1.0
In-Reply-To: <CALp4zZFjFPKcLj+RZOw81-NiXCiZah6ppOQwG4DF2sqQRDhWpw@mail.gmail.com>
References: <CALp4zZGPu-z-d=hAemM3QopkP3jndoXdQohRBvewBab4sOSmrA@mail.gmail.com>
 <CALp4zZHQJ2s_UkHyWMGiY5eejNJAJvGjQZQpO7FQ7+NfqCemHw@mail.gmail.com>
 <CADY20s5-ejwxsrtbfy839ExbXDsR8-YwKaXc2Yx-K_S5rYEG+w@mail.gmail.com> <CALp4zZFjFPKcLj+RZOw81-NiXCiZah6ppOQwG4DF2sqQRDhWpw@mail.gmail.com>
From: Todd Lipcon <todd@cloudera.com>
Date: Tue, 14 Feb 2017 10:44:49 -0800
Message-ID: <CADY20s7hmUteRaxySgZGXewJJNyO5Ssq5yoatc0dK-Egkn+bRA@mail.gmail.com>
Subject: Re: Missing 'com.cloudera.kudu.hive.KuduStorageHandler'
To: user@kudu.apache.org
Content-Type: multipart/alternative; boundary=94eb2c04c8ceae54da054881f67f
archived-at: Tue, 14 Feb 2017 18:45:29 -0000

--94eb2c04c8ceae54da054881f67f
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi Frank,

Could you try something like:

data =3D [(42, 2017, 'John')]
schema =3D StructType([
    StructField("id", ByteType(), True),
    StructField("year", ByteType(), True),
    StructField("name", StringType(), True)])
df =3D sqlContext.createDataFrame(data, schema)

That should explicitly set the types (based on my reading of the pyspark
docs for createDataFrame)

-Todd


On Tue, Feb 14, 2017 at 1:11 AM, Frank Heimerzheim <fh.ordix@gmail.com>
wrote:

> Hello,
>
> here a snippet which produces the error.
>
> Call from the shell:
> spark-submit --jars /opt/storage/data_nfs/cloudera
> /pyspark/libs/kudu-spark_2.10-1.2.0.jar test.py
>
>
> Snippet from the python-code test.py:
>
> (..)
> builder =3D kudu.schema_builder()
> builder.add_column('id', kudu.int64, nullable=3DFalse)
> builder.add_column('year', kudu.int32)
> builder.add_column('name', kudu.string)
> (..)
>
> (..)
> data =3D [(42, 2017, 'John')]
> df =3D sqlContext.createDataFrame(data, ['id', 'year', 'name'])
> df.write.format('org.apache.kudu.spark.kudu').option('kudu.master', kudu_=
master)\
>                                              .option('kudu.table', kudu_t=
able)\
>                                              .mode('append')\
>                                              .save()
> (..)
>
> Error:
> 17/02/13 12:59:24 INFO scheduler.TaskSetManager: Starting task 1.0 in sta=
ge 4.0 (TID 6, ls00152y.xxx.com, partition 1,PROCESS_LOCAL, 2096 bytes)
> 17/02/13 12:59:24 INFO scheduler.TaskSetManager: Finished task 0.0 in sta=
ge 4.0 (TID 5) in 113 ms on ls00152y.xxx.com (1/2)
> 17/02/13 12:59:24 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 4=
.0 (TID 6, ls00152y.xx.com): java.lang.IllegalArgumentException: year isn't=
 [Type: int64, size: 8, Type: unixtime_micros, size: 8], it's int32
> 	at org.apache.kudu.client.PartialRow.checkColumn(PartialRow.java:462)
> 	at org.apache.kudu.client.PartialRow.addLong(PartialRow.java:217)
> 	at org.apache.kudu.spark.kudu.KuduContext$$anonfun$org$apache$kudu$spark=
$kudu$KuduContext$$writePartitionRows$1$$anonfun$apply$2.apply(KuduContext.=
scala:215)
> 	at org.apache.kudu.spark.kudu.KuduContext$$anonfun$org$apache$kudu$spark=
$kudu$KuduContext$$writePartitionRows$1$$anonfun$apply$2.apply(KuduContext.=
scala:205)
> 	at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(=
TraversableLike.scala:772)
> 	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimize=
d.scala:33)
> 	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
> 	at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.s=
cala:771)
> 	at org.apache.kudu.spark.kudu.KuduContext$$anonfun$org$apache$kudu$spark=
$kudu$KuduContext$$writePartitionRows$1.apply(KuduContext.scala:205)
> 	at org.apache.kudu.spark.kudu.KuduContext$$anonfun$org$apache$kudu$spark=
$kudu$KuduContext$$writePartitionRows$1.apply(KuduContext.scala:203)
> 	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> 	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> 	at org.apache.kudu.spark.kudu.KuduContext.org$apache$kudu$spark$kudu$Kud=
uContext$$writePartitionRows(KuduContext.scala:203)
> 	at org.apache.kudu.spark.kudu.KuduContext$$anonfun$writeRows$1.apply(Kud=
uContext.scala:181)
> 	at org.apache.kudu.spark.kudu.KuduContext$$anonfun$writeRows$1.apply(Kud=
uContext.scala:180)
> 	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$3=
3.apply(RDD.scala:920)
> 	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$3=
3.apply(RDD.scala:920)
> 	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.sc=
ala:1869)
> 	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.sc=
ala:1869)
> 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> 	at org.apache.spark.scheduler.Task.run(Task.scala:89)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.=
java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor=
.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
>
>
> Same result with kudu.int8 and kudu.int16. Only kudu.int64 works for me. =
The problem persists, be the attribute part of the key or not.
>
> My
>
> Greeting
>
> Frank
>
>
> 2017-02-13 6:23 GMT+01:00 Todd Lipcon <todd@cloudera.com>:
>
>> On Tue, Feb 7, 2017 at 6:17 AM, Frank Heimerzheim <fh.ordix@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> quite a while i=C2=B4ve worked successfully with https://maven2repo.com=
/org.
>>> apache.kudu/kudu-spark_2.10/1.2.0/jar
>>>
>>> For a bit i ignored a problem with kudu datatype int8. With the
>>> connector i can=C2=B4t write int8 as int in python will always bring up
>>> errors like
>>>
>>> "java.lang.IllegalArgumentException: id isn=C2=B4t [Type: int64, size: =
8,
>>> Tye: unixtime_micros, size: 8], it=C2=B4s int8"
>>>
>>> As python isn=C2=B4t hard typed the connector is trying to find a suita=
ble
>>> type for python int in java/kudu. Apparently the python int is matched
>>> to int64/unixtime_micros and not int8 as kudu is expecting at this
>>> place.
>>>
>>> As a quick solution all my int in kudu are int64 at the moment
>>>
>>> In the long run i can=C2=B4t accept this waste of hdd space or even wor=
se
>>> I/O. Any idea when i can store int8 from python/spark to kudu?
>>>
>>> With the "normal" python api everything works fine, only the spark/kudu=
/python
>>> connector brings up the problem.
>>>
>>
>> Not 100% sure I'm following. You're using pyspark here? Can you post a
>> bit of sample code that reproduces the issue?
>>
>> -Todd
>>
>>
>>> 2016-12-13 12:12 GMT+01:00 Frank Heimerzheim <fh.ordix@gmail.com>:
>>>
>>>> Hello,
>>>>
>>>> within the impala-shell i can create an external table and thereafter
>>>> select and insert data from an underlying kudu table. Within the state=
ment
>>>> for creation of the table an 'StorageHandler' will be set to
>>>>  'com.cloudera.kudu.hive.KuduStorageHandler'. Everything works fine as
>>>> there exists apparently an *.jar with the referenced library within.
>>>>
>>>> When trying to select from a hive-shell there is an error that the
>>>> handler is not available. Trying to 'rdd.collect()' from an hiveCtx wi=
thin
>>>> an sparkSession i also get an error JavaClassNotFoundException as
>>>> the KuduStorageHandler is not available.
>>>>
>>>> I then tried to find a jar in my system with the intention to copy it
>>>> to all my data nodes. Sadly i couldn=C2=B4t find the specific jar. I t=
hink it
>>>> exists in the system as impala apparently is using it. For a test i=C2=
=B4ve
>>>> changed the 'StorageHandler' in the creation statement to
>>>> 'com.cloudera.kudu.hive.KuduStorageHandler_foo'. The create statement
>>>> worked. Also the select from impala, but i didin=C2=B4t return any dat=
a. There
>>>> was no error as i expected. The test was just for the case impala woul=
d in
>>>> a magic way select data from kudu without an correct 'StorageHandler'.
>>>> Apparently this is not the case and impala has access to an
>>>>  'com.cloudera.kudu.hive.KuduStorageHandler'.
>>>>
>>>> Long story, short question:
>>>> In which *.jar i can find the  'com.cloudera.kudu.hive.KuduS
>>>> torageHandler'?
>>>> Is the approach to copy the jar per hand to all nodes an appropriate
>>>> way to bring spark in a position to work with kudu?
>>>> What about the beeline-shell from hive and the possibility to read fro=
m
>>>> kudu?
>>>>
>>>> My Environment: Cloudera 5.7 with kudu and impala-kudu from installed
>>>> parcels. Build a working python-kudu library successfully from scratch=
 (git)
>>>>
>>>> Thanks a lot!
>>>> Frank
>>>>
>>>
>>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>


--=20
Todd Lipcon
Software Engineer, Cloudera

--94eb2c04c8ceae54da054881f67f
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Frank,<div><br></div><div>Could you try something like:=
</div><div><br></div>data =3D [(42, 2017, &#39;John&#39;)]<br>schema =3D St=
ructType([<br>=C2=A0 =C2=A0 StructField(&quot;id&quot;, ByteType(), True),<=
br>=C2=A0 =C2=A0 StructField(&quot;year&quot;, ByteType(), True),<br>=C2=A0=
 =C2=A0 StructField(&quot;name&quot;, StringType(), True)])<br>df =3D sqlCo=
ntext.createDataFrame(data, schema)<div><br></div><div>That should explicit=
ly set the types (based on my reading of the pyspark docs for createDataFra=
me)</div><div><br></div><div>-Todd</div><div><br></div></div><div class=3D"=
gmail_extra"><br><div class=3D"gmail_quote">On Tue, Feb 14, 2017 at 1:11 AM=
, Frank Heimerzheim <span dir=3D"ltr">&lt;<a href=3D"mailto:fh.ordix@gmail.=
com" target=3D"_blank">fh.ordix@gmail.com</a>&gt;</span> wrote:<br><blockqu=
ote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc s=
olid;padding-left:1ex"><div dir=3D"ltr">Hello,<div><br></div><div>here a sn=
ippet which produces the error.</div><div><br></div><div>Call from the shel=
l:</div><div><font face=3D"monospace, monospace">spark-submit --jars /opt/s=
torage/data_nfs/cloudera<wbr>/pyspark/libs/kudu-spark_2.10-<wbr>1.2.0.jar t=
est.py</font></div><div><br></div><div><br></div><div><pre style=3D"color:r=
gb(0,0,0);font-size:13.3333px"><font face=3D"arial, helvetica, sans-serif">=
Snippet from the python-code test.py:</font></pre><pre style=3D"color:rgb(0=
,0,0);font-size:13.3333px"><font face=3D"courier new">(..)<br>builder =3D k=
udu.schema_builder()<br>builder.add_column(</font><span style=3D"font-famil=
y:&quot;courier new&quot;;color:rgb(0,128,0);font-weight:bold">&#39;id&#39;=
</span><font face=3D"courier new">, kudu.int64, </font><span style=3D"font-=
family:&quot;courier new&quot;;color:rgb(102,0,153)">nullable</span><font f=
ace=3D"courier new">=3D</font><span style=3D"font-family:&quot;courier new&=
quot;;color:rgb(0,0,128)">False</span><font face=3D"courier new">)<br>build=
er.add_column(</font><span style=3D"font-family:&quot;courier new&quot;;col=
or:rgb(0,128,0);font-weight:bold">&#39;year&#39;</span><font face=3D"courie=
r new">, kudu.int32)<br>builder.add_column(</font><span style=3D"font-famil=
y:&quot;courier new&quot;;color:rgb(0,128,0);font-weight:bold">&#39;name=
9;</span><font face=3D"courier new">, kudu.string)<br>(..)<br><br>(..)<br>d=
ata =3D [(</font><span style=3D"font-family:&quot;courier new&quot;;color:r=
gb(0,0,255)">42</span><font face=3D"courier new">, </font><span style=3D"fo=
nt-family:&quot;courier new&quot;;color:rgb(0,0,255)">2017</span><font face=
=3D"courier new">, </font><span style=3D"font-family:&quot;courier new&quot=
;;color:rgb(0,128,0);font-weight:bold">&#39;John&#39;</span><font face=3D"c=
ourier new">)]<br>df =3D sqlContext.createDataFrame(</font><font face=3D"co=
urier new">dat<wbr>a, [</font><span style=3D"font-family:&quot;courier new&=
quot;;color:rgb(0,128,0);font-weight:bold">&#39;id&#39;</span><font face=3D=
"courier new">, </font><span style=3D"font-family:&quot;courier new&quot;;c=
olor:rgb(0,128,0);font-weight:bold">&#39;year&#39;</span><font face=3D"cour=
ier new">, </font><span style=3D"font-family:&quot;courier new&quot;;color:=
rgb(0,128,0);font-weight:bold">&#39;name&#39;</span><font face=3D"courier n=
ew">])<br>df.write.format(</font><span style=3D"font-family:&quot;courier n=
ew&quot;;color:rgb(0,128,0);font-weight:bold">&#39;org.apache.ku<wbr>du.spa=
rk.kudu&#39;</span><font face=3D"courier new">).option(</font><span style=
=3D"font-family:&quot;courier new&quot;;color:rgb(0,128,0);font-weight:bold=
">&#39;kudu.<wbr>master&#39;</span><font face=3D"courier new">, kudu_master=
)\<br>                                             .option(</font><span sty=
le=3D"font-family:&quot;courier new&quot;;color:rgb(0,128,0);font-weight:bo=
ld">&#39;kudu.table&#39;</span><font face=3D"courier new">, kudu_table)\<br=
>                                             .mode(</font><span style=3D"f=
ont-family:&quot;courier new&quot;;color:rgb(0,128,0);font-weight:bold">=
9;append&#39;</span><font face=3D"courier new">)\<br>                      =
                       .save()<br>(..)<br><br></font><font face=3D"arial, h=
elvetica, sans-serif">Error:<br></font><font face=3D"courier new">17/02/13 =
12:59:24 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 4.0 (TID=
 6, </font><a href=3D"http://ls00152y.xxx.com" style=3D"font-family:&quot;c=
ourier new&quot;" target=3D"_blank">ls00152y.xxx.com</a><font face=3D"couri=
er new">, partition 1,PROCESS_LOCAL, 2096 bytes)<br>17/02/13 12:59:24 INFO =
scheduler.TaskSetManager: Finished task 0.0 in stage 4.0 (TID 5) in 113 ms =
on </font><a href=3D"http://ls00152y.xxx.com" style=3D"font-family:&quot;co=
urier new&quot;" target=3D"_blank">ls00152y.xxx.com</a><font face=3D"courie=
r new"> (1/2)<br>17/02/13 12:59:24 WARN scheduler.TaskSetManager: Lost task=
 1.0 in stage 4.0 (TID 6, </font><a href=3D"http://ls00152y.xx.com" style=
=3D"font-family:&quot;courier new&quot;" target=3D"_blank">ls00152y.xx.com<=
/a><font face=3D"courier new">): java.lang.</font><font face=3D"courier new=
">IllegalArgumentExcep<wbr>tion: year isn&#39;t [Type: int64, size: 8, Type=
: unixtime_micros, size: 8], it&#39;s int32<br>	at org.apache.kudu.client.<=
/font><font face=3D"courier new">Partial<wbr>Row.checkColumn(</font><font f=
ace=3D"courier new">PartialRow.<wbr>java:462)<br>	at org.apache.kudu.client=
.</font><font face=3D"courier new">Partial<wbr>Row.addLong(PartialRow.</fon=
t><font face=3D"courier new">java:<wbr>217)<br>	at org.apache.kudu.spark.ku=
du.</font><font face=3D"courier new">Kud<wbr>uContext$$anonfun$org$</font><=
font face=3D"courier new">apache$<wbr>kudu$spark$kudu$</font><font face=3D"=
courier new">KuduContext$$</font><font face=3D"courier new">w<wbr>riteParti=
tionRows$1$$anonfun$</font><font face=3D"courier new">a<wbr>pply$2.apply(Ku=
duContext.</font><font face=3D"courier new">scala<wbr>:215)<br>	at org.apac=
he.kudu.spark.kudu.</font><font face=3D"courier new">Kud<wbr>uContext$$anon=
fun$org$</font><font face=3D"courier new">apache$<wbr>kudu$spark$kudu$</fon=
t><font face=3D"courier new">KuduContext$$</font><font face=3D"courier new"=
>w<wbr>ritePartitionRows$1$$anonfun$</font><font face=3D"courier new">a<wbr=
>pply$2.apply(KuduContext.</font><font face=3D"courier new">scala<wbr>:205)=
<br>	at scala.collection.</font><font face=3D"courier new">TraversableLi<wb=
r>ke$WithFilter$$</font><font face=3D"courier new">anonfun$<wbr>foreach$1.a=
pply(</font><font face=3D"courier new">TraversableLik<wbr>e.scala:772)<br>	=
at scala.collection.</font><font face=3D"courier new">IndexedSeqOpt<wbr>imi=
zed$class.</font><font face=3D"courier new">foreach(<wbr>IndexedSeqOptimize=
d.</font><font face=3D"courier new">scala:33)<br>	at scala.collection.mutab=
le.</font><font face=3D"courier new">Array<wbr>Ops$ofRef.foreach(</font><fo=
nt face=3D"courier new">ArrayOps.<wbr>scala:108)<br>	at scala.collection.</=
font><font face=3D"courier new">TraversableLi<wbr>ke$WithFilter.</font><fon=
t face=3D"courier new">foreach(<wbr>TraversableLike.scala:</font><font face=
=3D"courier new">771)<br>	at org.apache.kudu.spark.kudu.</font><font face=
=3D"courier new">Kud<wbr>uContext$$anonfun$org$</font><font face=3D"courier=
 new">apache$<wbr>kudu$spark$kudu$</font><font face=3D"courier new">KuduCon=
text$$</font><font face=3D"courier new">w<wbr>ritePartitionRows$1.apply(</f=
ont><font face=3D"courier new">Kudu<wbr>Context.scala:205)<br>	at org.apach=
e.kudu.spark.kudu.</font><font face=3D"courier new">Kud<wbr>uContext$$anonf=
un$org$</font><font face=3D"courier new">apache$<wbr>kudu$spark$kudu$</font=
><font face=3D"courier new">KuduContext$$</font><font face=3D"courier new">=
w<wbr>ritePartitionRows$1.apply(</font><font face=3D"courier new">Kudu<wbr>=
Context.scala:203)<br>	at scala.collection.Iterator$</font><font face=3D"co=
urier new">clas<wbr>s.foreach(Iterator.scala:</font><font face=3D"courier n=
ew">727)<br>	at scala.collection.</font><font face=3D"courier new">Abstract=
Itera<wbr>tor.foreach(</font><font face=3D"courier new">Iterator.scala:<wbr=
>1157)<br>	at </font><a href=3D"http://org.apache.kudu.spark.kudu.KuduConte=
xt.org" style=3D"font-family:&quot;courier new&quot;" target=3D"_blank">org=
.apache.kudu.spark.kudu.Kud<wbr>uContext.org</a><font face=3D"courier new">=
$apache$kudu$</font><font face=3D"courier new">spark<wbr>$kudu$KuduContext$=
$</font><font face=3D"courier new">writePartit<wbr>ionRows(</font><font fac=
e=3D"courier new">KuduContext.scala:203)<br>	at org.apache.kudu.spark.kudu.=
</font><font face=3D"courier new">Kud<wbr>uContext$$anonfun$</font><font fa=
ce=3D"courier new">writeRows$1.<wbr>apply(KuduContext.</font><font face=3D"=
courier new">scala:181)<br>	at org.apache.kudu.spark.kudu.</font><font face=
=3D"courier new">Kud<wbr>uContext$$anonfun$</font><font face=3D"courier new=
">writeRows$1.<wbr>apply(KuduContext.</font><font face=3D"courier new">scal=
a:180)<br>	at org.apache.spark.rdd.RDD$$</font><font face=3D"courier new">a=
non<wbr>fun$foreachPartition$1$$</font><font face=3D"courier new">anonfu<wb=
r>n$apply$33.apply(RDD.</font><font face=3D"courier new">scala:<wbr>920)<br=
>	at org.apache.spark.rdd.RDD$$</font><font face=3D"courier new">anon<wbr>f=
un$foreachPartition$1$$</font><font face=3D"courier new">anonfu<wbr>n$apply=
$33.apply(RDD.</font><font face=3D"courier new">scala:<wbr>920)<br>	at org.=
apache.spark.SparkContext$</font><font face=3D"courier new"><wbr>$anonfun$r=
unJob$5.apply(</font><font face=3D"courier new">SparkC<wbr>ontext.scala:186=
9)<br>	at org.apache.spark.SparkContext$</font><font face=3D"courier new"><=
wbr>$anonfun$runJob$5.apply(</font><font face=3D"courier new">SparkC<wbr>on=
text.scala:1869)<br>	at org.apache.spark.scheduler.</font><font face=3D"cou=
rier new">Res<wbr>ultTask.runTask(ResultTask.</font><font face=3D"courier n=
ew">sca<wbr>la:66)<br>	at org.apache.spark.scheduler.</font><font face=3D"c=
ourier new">Tas<wbr>k.run(Task.scala:89)<br>	at org.apache.spark.executor.<=
/font><font face=3D"courier new">Exec<wbr>utor$TaskRunner.run(</font><font =
face=3D"courier new">Executor.<wbr>scala:214)<br>	at java.util.concurrent.<=
/font><font face=3D"courier new">ThreadPoo<wbr>lExecutor.runWorker(</font><=
font face=3D"courier new">ThreadPool<wbr>Executor.java:1142)<br>	at java.ut=
il.concurrent.</font><font face=3D"courier new">ThreadPoo<wbr>lExecutor$Wor=
ker.run(</font><font face=3D"courier new">ThreadPoo<wbr>lExecutor.java:617)=
<br>	at java.lang.Thread.run(Thread.</font><font face=3D"courier new">ja<wb=
r>va:745)</font></pre><pre style=3D"color:rgb(0,0,0);font-size:13.3333px;fo=
nt-family:&quot;courier new&quot;"><br></pre><pre style=3D"color:rgb(0,0,0)=
;font-size:13.3333px"><font face=3D"arial, helvetica, sans-serif">Same resu=
lt with kudu.int8 and kudu.int16. Only kudu.int64 works for me. The problem=
 persists, be the attribute part of the key or not.</font></pre><pre style=
=3D"color:rgb(0,0,0);font-size:13.3333px"><font face=3D"arial, helvetica, s=
ans-serif">My </font></pre><pre style=3D"color:rgb(0,0,0);font-size:13.3333=
px"><font face=3D"arial, helvetica, sans-serif">Greeting</font></pre><span =
class=3D"HOEnZb"><font color=3D"#888888"><pre style=3D"color:rgb(0,0,0);fon=
t-size:13.3333px"><font face=3D"arial, helvetica, sans-serif">Frank</font><=
/pre></font></span></div></div><div class=3D"HOEnZb"><div class=3D"h5"><div=
 class=3D"gmail_extra"><br><div class=3D"gmail_quote">2017-02-13 6:23 GMT+0=
1:00 Todd Lipcon <span dir=3D"ltr">&lt;<a href=3D"mailto:todd@cloudera.com"=
 target=3D"_blank">todd@cloudera.com</a>&gt;</span>:<br><blockquote class=
=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padd=
ing-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gma=
il_quote"><span>On Tue, Feb 7, 2017 at 6:17 AM, Frank Heimerzheim <span dir=
=3D"ltr">&lt;<a href=3D"mailto:fh.ordix@gmail.com" target=3D"_blank">fh.ord=
ix@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" sty=
le=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div d=
ir=3D"ltr">Hello,<div><br></div><div>quite a while i=C2=B4ve worked success=
fully with=C2=A0<a href=3D"https://maven2repo.com/org" target=3D"_blank">ht=
tps://maven2repo.com/or<wbr>g</a>.<span id=3D"m_-8840693358020657120m_-4554=
55318318400881m_4092011076899143799:16r.1">apache</span>.<span id=3D"m_-884=
0693358020657120m_-455455318318400881m_4092011076899143799:16r.2">kudu</spa=
n>/<span id=3D"m_-8840693358020657120m_-455455318318400881m_409201107689914=
3799:16r.3">kudu</span>-spark_2.10/<wbr>1.2.0/jar</div><div><br></div><div>=
For a bit i ignored a problem with <span id=3D"m_-8840693358020657120m_-455=
455318318400881m_4092011076899143799:16r.4">kudu</span> <span id=3D"m_-8840=
693358020657120m_-455455318318400881m_4092011076899143799:16r.5">datatype</=
span> int8. With the connector i can=C2=B4t write int8 as int in python wil=
l <span id=3D"m_-8840693358020657120m_-455455318318400881m_4092011076899143=
799:16r.6">always</span> bring up errors like</div><div><br></div><div>&quo=
t;java.<span id=3D"m_-8840693358020657120m_-455455318318400881m_40920110768=
99143799:16r.7">lang</span>.<span id=3D"m_-8840693358020657120m_-4554553183=
18400881m_4092011076899143799:16r.8">IllegalArgumentExce<wbr>ption</span>: =
id isn=C2=B4t [Type: int64, size: 8, Tye: <span id=3D"m_-884069335802065712=
0m_-455455318318400881m_4092011076899143799:16r.9">unixtime</span>_micros, =
size: 8], it=C2=B4s int8&quot;</div><div><br></div><div>As python isn=C2=B4=
t hard typed the connector is trying to find a suitable type for python int=
 in java/<span id=3D"m_-8840693358020657120m_-455455318318400881m_409201107=
6899143799:16r.10">kudu</span>. Apparently the python int is matched to int=
64/<span id=3D"m_-8840693358020657120m_-455455318318400881m_409201107689914=
3799:16r.11">unixtime</span>_micros and not int8 as <span id=3D"m_-88406933=
58020657120m_-455455318318400881m_4092011076899143799:16r.12">kudu</span> i=
s expecting at this place.</div><div><br></div><div>As a quick solution all=
 my int in <span id=3D"m_-8840693358020657120m_-455455318318400881m_4092011=
076899143799:16r.13">kudu</span>=C2=A0are int64 at the moment</div><div><br=
></div><div>In the long run i can=C2=B4t accept this waste of <span id=3D"m=
_-8840693358020657120m_-455455318318400881m_4092011076899143799:16r.14">hdd=
</span> space or even worse I/O. Any idea when i can store int8 from python=
/spark to <span id=3D"m_-8840693358020657120m_-455455318318400881m_40920110=
76899143799:16r.15">kudu</span>?<br></div><div><br></div><div>With the &quo=
t;normal&quot; python api everything <span id=3D"m_-8840693358020657120m_-4=
55455318318400881m_4092011076899143799:16r.16">works</span> fine, only the =
spark/<span id=3D"m_-8840693358020657120m_-455455318318400881m_409201107689=
9143799:16r.17">kudu</span>/python connector brings up the problem.</div></=
div></blockquote><div><br></div></span><div>Not 100% sure I&#39;m following=
. You&#39;re using pyspark here? Can you post a bit of sample code that rep=
roduces the issue?</div><span class=3D"m_-8840693358020657120HOEnZb"><font =
color=3D"#888888"><div><br></div><div>-Todd</div></font></span><span><div>=
=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bo=
rder-left:1px #ccc solid;padding-left:1ex"><div class=3D"m_-884069335802065=
7120m_-455455318318400881HOEnZb"><div class=3D"m_-8840693358020657120m_-455=
455318318400881h5"><div class=3D"gmail_extra"><div class=3D"gmail_quote">20=
16-12-13 12:12 GMT+01:00 Frank Heimerzheim <span dir=3D"ltr">&lt;<a href=3D=
"mailto:fh.ordix@gmail.com" target=3D"_blank">fh.ordix@gmail.com</a>&gt;</s=
pan>:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;borde=
r-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">Hello,<div><br></d=
iv><div>within the impala-shell i can create an external table and thereaft=
er select and insert data from an underlying kudu table. Within the stateme=
nt for creation of the table an &#39;StorageHandler&#39; will be set to =C2=
=A0&#39;com.cloudera.kudu.hive.KuduS<wbr>torageHandler&#39;. Everything wor=
ks fine as there exists apparently an *.jar with the referenced library wit=
hin.</div><div><br></div><div>When trying to select from a hive-shell there=
 is an error that the handler is not available. Trying to &#39;rdd.collect(=
)&#39; from an hiveCtx within an sparkSession i also get an error JavaClass=
NotFoundException as the=C2=A0KuduStorageHandler is not available.</div><di=
v><br></div><div>I then tried to find a jar in my system with the intention=
 to copy it to all my data nodes. Sadly i couldn=C2=B4t find the specific j=
ar. I think it exists in the system as impala apparently is using it. For a=
 test i=C2=B4ve changed the &#39;StorageHandler&#39; in the creation statem=
ent to &#39;com.cloudera.kudu.hive.KuduSt<wbr>orageHandler_foo&#39;. The cr=
eate statement worked. Also the select from impala, but i didin=C2=B4t retu=
rn any data. There was no error as i expected. The test was just for the ca=
se impala would in a magic way select data from kudu without an correct =
9;StorageHandler&#39;. Apparently this is not the case and impala has acces=
s to an =C2=A0&#39;com.cloudera.kudu.hive.KuduS<wbr>torageHandler&#39;.</di=
v><div><br></div><div>Long story, short question:</div><div>In which *.jar =
i can find the =C2=A0&#39;com.cloudera.kudu.hive.KuduS<wbr>torageHandler=
9;?</div><div>Is the approach to copy the jar per hand to all nodes an appr=
opriate way to bring spark in a position to work with kudu?</div><div>What =
about the beeline-shell from hive and the possibility to read from kudu?</d=
iv><div><br></div><div>My Environment: Cloudera 5.7 with kudu and impala-ku=
du from installed parcels. Build a working python-kudu library successfully=
 from scratch (git)</div><div><br></div><div>Thanks a lot!</div><span class=
=3D"m_-8840693358020657120m_-455455318318400881m_4092011076899143799HOEnZb"=
><font color=3D"#888888"><div>Frank</div></font></span></div>
</blockquote></div><br></div>
</div></div></blockquote></span></div><br><br clear=3D"all"><span><div><br>=
</div>-- <br><div class=3D"m_-8840693358020657120m_-455455318318400881gmail=
_signature" data-smartmail=3D"gmail_signature">Todd Lipcon<br>Software Engi=
neer, Cloudera</div>
</span></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
<div class=3D"gmail_signature" data-smartmail=3D"gmail_signature">Todd Lipc=
on<br>Software Engineer, Cloudera</div>
</div>

--94eb2c04c8ceae54da054881f67f--