Return-Path: X-Original-To: apmail-spark-dev-archive@minotaur.apache.org Delivered-To: apmail-spark-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6641117F9B for ; Wed, 25 Mar 2015 04:22:37 +0000 (UTC) Received: (qmail 42246 invoked by uid 500); 25 Mar 2015 04:21:38 -0000 Delivered-To: apmail-spark-dev-archive@spark.apache.org Received: (qmail 42176 invoked by uid 500); 25 Mar 2015 04:21:38 -0000 Mailing-List: contact dev-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@spark.apache.org Received: (qmail 42165 invoked by uid 99); 25 Mar 2015 04:21:37 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Mar 2015 04:21:37 +0000 X-ASF-Spam-Status: No, hits=1.5 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: error (nike.apache.org: local policy) Received: from [209.85.192.45] (HELO mail-qg0-f45.google.com) (209.85.192.45) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Mar 2015 04:21:11 +0000 Received: by qgh3 with SMTP id 3so13971705qgh.2 for ; Tue, 24 Mar 2015 21:20:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=81rfiXGYznN/ENKNoCbwRQEFYQAs7k1naE9jSxAo1o0=; b=Kqxe6GZdPkYr04ABZpGEtWVgiV/kSAwNDdMLhhVwRjaz4YjQEanUfMEAYlQCO3QISd kNX8zEwBIHivFuSqa3kS1IILCUkE6COuQ2nSsWsyAMGwdjZMlh14rdhfN3E3IP7vxupV kdMTYFpTpm1Cc7TJQwagy/ejNWnBrZZ3+wuUvhOblnKg8mH0xi47n5JHJMw0VfeoL/8m V9N3nQAxcn0X8yQvx/b6vROQ20SZ4WvW7mWBkO56lGUjF/Zj5fvgouc5enqS6OKYlT5W cOEhzfVl37jmuZIayZab6ib45yWHNDWaaNM6WKN0pItDEDkXFRenBw2kp+1eD6S62mxE 2oyg== X-Gm-Message-State: ALoCoQkKxU7yo4HKuB1V9z6TMIpjaHDVPHL7JVDuWbOqokySt6/SqieT1qN+YiADCkXEStfHSIj7 X-Received: by 10.229.68.136 with SMTP id v8mr10202826qci.16.1427257248064; Tue, 24 Mar 2015 21:20:48 -0700 (PDT) MIME-Version: 1.0 Received: by 10.96.93.101 with HTTP; Tue, 24 Mar 2015 21:20:27 -0700 (PDT) In-Reply-To: References: From: Reynold Xin Date: Tue, 24 Mar 2015 21:20:27 -0700 Message-ID: Subject: Re: Spark SQL(1.3.0) "import sqlContext.implicits._" seems not work for converting a case class RDD to DataFrame To: Ted Yu Cc: Zhiwei Chan , "dev@spark.apache.org" Content-Type: multipart/alternative; boundary=001a1133984446407305121539ce X-Virus-Checked: Checked by ClamAV on apache.org --001a1133984446407305121539ce Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable In particular: http://spark.apache.org/docs/latest/sql-programming-guide.html "Additionally, the implicit conversions now only augment RDDs that are composed of Products (i.e., case classes or tuples) with a method toDF, instead of applying automatically." On Tue, Mar 24, 2015 at 9:07 PM, Ted Yu wrote: > Please take a look at: > ./sql/core/src/main/scala/org/apache/spark/sql/DataFrameHolder.scala > ./sql/core/src/main/scala/org/apache/spark/sql/GroupedData.scala > > Cheers > > On Tue, Mar 24, 2015 at 8:46 PM, Zhiwei Chan > wrote: > > > Hi all, > > > > I just upgraded spark from 1.2.1 to 1.3.0, and changed the "import > > sqlContext.createSchemaRDD" to "import sqlContext.implicits._" in my > code. > > (I scan the programming guide and it seems this is the only change I ne= ed > > to do). But it come to an error when run compile as following: > > >>> > > [ERROR] ...\magic.scala:527: error: value registerTempTable is not a > member > > of org.apache.spark.rdd.RDD[com.yhd.ycache.magic.Table] > > [INFO] tableRdd.registerTempTable(tableName) > > <<< > > > > Then I try the exactly example in the programming guide of 1.3 in > > spark-shell, it come to the same error. > > >>> > > scala> sys.env.get("CLASSPATH") > > res7: Option[String] =3D > > > > > Some(:/root/scala/spark-1.3.0-bin-hadoop2.4/conf:/root/scala/spark-1.3.0-= bin-hadoop2.4/lib/spark-assembly-1.3.0-hadoop2.4.0.jar:/root/scala/spark-1.= 3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/root/scala/spark-1.3.0-b= in-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar:/root/scala/spark-1.3.0-bin-ha= doop2.4/lib/datanucleus-api-jdo-3.2.6.jar) > > > > scala> val sqlContext =3D new org.apache.spark.sql.SQLContext(sc) > > sqlContext: org.apache.spark.sql.SQLContext =3D > > org.apache.spark.sql.SQLContext@4b05b3ff > > > > scala> import sqlContext.implicits._ > > import sqlContext.implicits._ > > > > scala> case class Person(name: String, age: Int) > > defined class Person > > > > scala> val t1 =3D > > sc.textFile("hdfs://heju:8020/user/root/magic/poolInfo.txt") > > 15/03/25 11:13:35 INFO MemoryStore: ensureFreeSpace(81443) called with > > curMem=3D186397, maxMem=3D278302556 > > 15/03/25 11:13:35 INFO MemoryStore: Block broadcast_3 stored as values = in > > memory (estimated size 79.5 KB, free 265.2 MB) > > 15/03/25 11:13:35 INFO MemoryStore: ensureFreeSpace(31262) called with > > curMem=3D267840, maxMem=3D278302556 > > 15/03/25 11:13:35 INFO MemoryStore: Block broadcast_3_piece0 stored as > > bytes in memory (estimated size 30.5 KB, free 265.1 MB) > > 15/03/25 11:13:35 INFO BlockManagerInfo: Added broadcast_3_piece0 in > memory > > on heju:48885 (size: 30.5 KB, free: 265.4 MB) > > 15/03/25 11:13:35 INFO BlockManagerMaster: Updated info of block > > broadcast_3_piece0 > > 15/03/25 11:13:35 INFO SparkContext: Created broadcast 3 from textFile = at > > :34 > > t1: org.apache.spark.rdd.RDD[String] =3D > > hdfs://heju:8020/user/root/magic/poolInfo.txt MapPartitionsRDD[9] at > > textFile at :34 > > > > scala> val t2 =3D t1.flatMap(_.split("\n")).map(_.split(" ")).map(p = =3D> > > Person(p(0),1)) > > t2: org.apache.spark.rdd.RDD[Person] =3D MapPartitionsRDD[12] at map at > > :38 > > > > scala> t2.registerTempTable("people") > > :41: error: value registerTempTable is not a member of > > org.apache.spark.rdd.RDD[Person] > > t2.registerTempTable("people") > > ^ > > <<< > > > > I found the following explanation in programming guide about implicit > > convert case class to DataFrams, but I don't understand what I should d= o. > > Could any one tell me how should I do if I want to convert a case class > RDD > > to DataFrame? > > > > >>> > > Isolation of Implicit Conversions and Removal of dsl Package (Scala-onl= y) > > > > Many of the code examples prior to Spark 1.3 started with import > > sqlContext._, which brought all of the functions from sqlContext into > > scope. In Spark 1.3 we have isolated the implicit conversions for > > converting RDDs into DataFrames into an object inside of the SQLContext= . > > Users should now write import sqlContext.implicits._. > > > > Additionally, the implicit conversions now only augment RDDs that are > > composed of Products (i.e., case classes or tuples) with a method toDF, > > instead of applying automatically. > > > > <<< > > Thanks > > Jason > > > --001a1133984446407305121539ce--