Return-Path: X-Original-To: apmail-kudu-user-archive@minotaur.apache.org Delivered-To: apmail-kudu-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4CE7218EE9 for ; Tue, 1 Mar 2016 17:59:41 +0000 (UTC) Received: (qmail 84601 invoked by uid 500); 1 Mar 2016 17:59:06 -0000 Delivered-To: apmail-kudu-user-archive@kudu.apache.org Received: (qmail 84557 invoked by uid 500); 1 Mar 2016 17:59:06 -0000 Mailing-List: contact user-help@kudu.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@kudu.incubator.apache.org Delivered-To: mailing list user@kudu.incubator.apache.org Received: (qmail 84549 invoked by uid 99); 1 Mar 2016 17:59:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Mar 2016 17:59:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 23EE1C3764 for ; Tue, 1 Mar 2016 17:59:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.279 X-Spam-Level: * X-Spam-Status: No, score=1.279 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=cloudera-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id c-oYH0BQnRfh for ; Tue, 1 Mar 2016 17:59:04 +0000 (UTC) Received: from mail-lb0-f174.google.com (mail-lb0-f174.google.com [209.85.217.174]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id CE26E5FBC0 for ; Tue, 1 Mar 2016 17:58:50 +0000 (UTC) Received: by mail-lb0-f174.google.com with SMTP id of3so103750011lbc.1 for ; Tue, 01 Mar 2016 09:58:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudera-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=G7Y312jMlmjW6DIeMHfNCxaMwpCd0OWj7Svv49dSMb0=; b=gficEHe3com1EalrNS3yJ1c7AKjWfk8I1rBBlEipEIciTVOBngdsPKvro0mm9bzojA kIXU8cdnBHZ7MiDQi4OrXPL5oLO7kRuzT0skt5935OlzRnmEqM/2vC4MbomOkjrB02KF LisDsGrR7vigVjpn5asYNmiG9aUNHYge/Dl+zCYwMi5d605rwjQt0Zixe8A3FXSfduJj Fg6kR1vi8jceJS9mIIsq71o6A6XanToyrsy/pZFQmyl/osbbrr7PwzMgRc8ZJU42Iz+x EGGTmgM0H0/prvRafNxddDd3BOTi5P6DofZygYy/6gjCP77s+AeAOPI9Z0AkWWooTGIa GYrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=G7Y312jMlmjW6DIeMHfNCxaMwpCd0OWj7Svv49dSMb0=; b=TsSVYTUAuIluJmOkC+yx1Ix6g3KEHcUpy8wCNki5NDF+bV1/xejYkJ79u4UmfcZRZw HtWifIDDVT++BvjQmum7yBWKSmqEz+ZAJVrvlItHbVscp0Op63QZkhNJgTKyocOR+j3g uQ2Y+XJQjux3N+vV1Zs9nTAmi5YTvpjM0CEOJoTtcYfo6qEhz7v06WUoEMRiTjLyyZC+ 7VSWgpW6pzak9tkND2+EsH3zNQtmh4+f4YkAtrp6Slr8rCRaMCArm1P90vYM+Z0Gx19J cgxRo938eUFv6wOYU/oNMgXuMenYkjbVQZZeXHAINtAY0leJhWRSp9VmUXCMZfRhJI/d lzhw== X-Gm-Message-State: AD7BkJLg1dlLuAsqZ8ys3Nm6DhB+tguiWklVzrvpNDmJdvaHHZAddSvMMc/Ijy5FvY2KOPEV5lnS2h2ucnsyagEv X-Received: by 10.112.84.232 with SMTP id c8mr8365462lbz.136.1456855080619; Tue, 01 Mar 2016 09:58:00 -0800 (PST) MIME-Version: 1.0 Received: by 10.112.184.195 with HTTP; Tue, 1 Mar 2016 09:57:40 -0800 (PST) In-Reply-To: References: From: Todd Lipcon Date: Tue, 1 Mar 2016 09:57:40 -0800 Message-ID: Subject: Re: Spark SQL on kudu can not contains nullable columns? To: user@kudu.incubator.apache.org Content-Type: multipart/alternative; boundary=001a1134c70891fbbe052d0081bc --001a1134c70891fbbe052d0081bc Content-Type: text/plain; charset=UTF-8 Perhaps we should target this for 0.7.1 as well, if we're going to do that follow-up release? Seems like it should be an easy fix (and client-side only) -Todd On Tue, Mar 1, 2016 at 9:29 AM, Jean-Daniel Cryans wrote: > Ha yeah that's a good one. I opened this jira: > https://issues.apache.org/jira/browse/KUDU-1360 > > Basically we forgot to check for nulls :) > > J-D > > On Tue, Mar 1, 2016 at 9:18 AM, Darren Hoo wrote: > >> Spark SQL on kudu can not contains nullable columns? >> >> I've create one table in kudu(0.6.0) which has nullable columns, >> when I try to use spark sql (using kudu java client 0.7.0) like this: >> >> sqlContext.load("org.kududb.spark",Map("kudu.table" -> "contents", >> "kudu.master" -> "master1:7051")).registerTempTable("contents") >> sqlContext.sql("SELECT * FROM * FROM contents limit 10").collectAsList() >> >> I got this error: >> >> 16/03/02 00:45:42 INFO DAGScheduler: Job 4 failed: collect at >> :20, took 11.813423 s >> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 >> in stage 7.0 failed 4 times, most recent failure: Lost task 0.3 in stage >> 7.0 (TID 62, slave29): java.lang.IllegalArgumentException: The requested >> column (4) is null >> at org.kududb.client.RowResult.checkNull(RowResult.java:475) >> at org.kududb.client.RowResult.getString(RowResult.java:321) >> at org.kududb.client.RowResult.getString(RowResult.java:308) >> at org.kududb.spark.KuduRelation.org >> $kududb$spark$KuduRelation$$getKuduValue(DefaultSource.scala:144) >> at >> org.kududb.spark.KuduRelation$$anonfun$buildScan$1$$anonfun$apply$1.apply(DefaultSource.scala:126) >> at >> org.kududb.spark.KuduRelation$$anonfun$buildScan$1$$anonfun$apply$1.apply(DefaultSource.scala:126) >> at >> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >> at >> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >> at >> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) >> at >> scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) >> at >> scala.collection.TraversableLike$class.map(TraversableLike.scala:244) >> at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) >> at >> org.kududb.spark.KuduRelation$$anonfun$buildScan$1.apply(DefaultSource.scala:126) >> at >> org.kududb.spark.KuduRelation$$anonfun$buildScan$1.apply(DefaultSource.scala:124) >> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >> at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) >> at scala.collection.Iterator$class.foreach(Iterator.scala:727) >> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) >> at >> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) >> at >> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) >> at >> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) >> at scala.collection.TraversableOnce$class.to >> (TraversableOnce.scala:273) >> at scala.collection.AbstractIterator.to(Iterator.scala:1157) >> at >> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) >> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) >> at >> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) >> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) >> at >> org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:215) >> at >> org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:215) >> at >> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850) >> at >> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850) >> at >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) >> at org.apache.spark.scheduler.Task.run(Task.scala:88) >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> >> Is this due to the version incompatibily between my kudu server(0.6.0) >> and java client (0.7.0)? >> >> > -- Todd Lipcon Software Engineer, Cloudera --001a1134c70891fbbe052d0081bc Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Perhaps we should target this for 0.7.1 as well, if we'= ;re going to do that follow-up release? Seems like it should be an easy fix= (and client-side only)

-Todd

On Tue, Mar 1, 2016 at 9:29 AM, = Jean-Daniel Cryans <jdcryans@apache.org> wrote:

On Tue, Mar 1, 2016 at 9:18 AM, Darren= Hoo <darren.hoo@gmail.com> wrote:
Spark SQL on kudu can not contains null= able columns?

I've create one table in kudu(0.= 6.0) which has nullable columns,
when I try to use spark sql (usi= ng kudu java client 0.7.0) like this:

sqlContext.l= oad("org.kududb.spark",Map("kudu.table" -> "con= tents", "kudu.master" -> "master1:7051")).regis= terTempTable("contents")
sqlContext.sql("SELECT * = FROM * FROM contents limit 10").collectAsList()

I got this error:

16/03/02 00:45:42 INFO DAGSch= eduler: Job 4 failed: collect at <console>:20, took 11.813423 s
=
org.apache.spark.SparkException: Job aborted due to stage failure: Tas= k 0 in stage 7.0 failed 4 times, most recent failure: Lost task 0.3 in stag= e 7.0 (TID 62, slave29): java.lang.IllegalArgumentException: The requested = column (4) =C2=A0is null
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.kudud= b.client.RowResult.checkNull(RowResult.java:475)
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 at org.kududb.client.RowResult.getString(RowResult.java:321)<= /div>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.kududb.client.RowResult.getStr= ing(RowResult.java:308)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.kududb.spa= rk.KuduRelation.org$kududb$spark$KuduRelation$$getKuduValue(DefaultSour= ce.scala:144)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.kududb.spark.Kud= uRelation$$anonfun$buildScan$1$$anonfun$apply$1.apply(DefaultSource.scala:1= 26)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.kududb.spark.KuduRelation$= $anonfun$buildScan$1$$anonfun$apply$1.apply(DefaultSource.scala:126)
<= div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.collection.TraversableLike$$anonfu= n$map$1.apply(TraversableLike.scala:244)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLik= e.scala:244)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.collection.Inde= xedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 at scala.collection.mutable.ArrayOps$ofRef.foreach= (ArrayOps.scala:108)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.collect= ion.TraversableLike$class.map(TraversableLike.scala:244)
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOp= s.scala:108)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.kududb.spark.Kudu= Relation$$anonfun$buildScan$1.apply(DefaultSource.scala:126)
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 at org.kududb.spark.KuduRelation$$anonfun$buildSca= n$1.apply(DefaultSource.scala:124)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at= scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 at scala.collection.Iterator$$anon$11.next(Iterato= r.scala:328)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.collection.Iter= ator$$anon$11.next(Iterator.scala:328)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.collection.Iterator$class.foreach(It= erator.scala:727)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.collection= .AbstractIterator.foreach(Iterator.scala:1157)
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 at scala.collection.generic.Growable$class.$plus$plus$eq(Growabl= e.scala:48)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.collection.mutab= le.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
=C2=A0 =C2=A0= =C2=A0 =C2=A0 at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayB= uffer.scala:47)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.collection.T= raversableOnce$class.to(T= raversableOnce.scala:273)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.co= llection.AbstractIterator.to(Iterator.scala:1157)
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 at scala.collection.TraversableOnce$class.toBuffer(Traver= sableOnce.scala:265)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.collect= ion.AbstractIterator.toBuffer(Iterator.scala:1157)
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 at scala.collection.TraversableOnce$class.toArray(Traversable= Once.scala:252)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.collection.A= bstractIterator.toArray(Iterator.scala:1157)
=C2=A0 =C2=A0 =C2=A0= =C2=A0 at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkP= lan.scala:215)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.spark.sq= l.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:215)
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.spark.SparkContext$$anonfun$runJob$5= .apply(SparkContext.scala:1850)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at or= g.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850= )
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.spark.scheduler.Resul= tTask.runTask(ResultTask.scala:66)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at= org.apache.spark.scheduler.Task.run(Task.scala:88)
=C2=A0 =C2=A0= =C2=A0 =C2=A0 at org.apache.spark.executor.Executor$TaskRunner.run(Executo= r.scala:214)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.concurrent.= ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
=C2=A0= =C2=A0 =C2=A0 =C2=A0 at java.util.concurrent.ThreadPoolExecutor$Worker.run= (ThreadPoolExecutor.java:615)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java= .lang.Thread.run(Thread.java:745)

Is this due to t= he version incompatibily between my kudu server(0.6.0) and java client (0.7= .0)?





--
=
Todd Lipcon
Software Engineer, Cloudera
--001a1134c70891fbbe052d0081bc--