Mailing-List: contact dev-help@spark.apache.org; run by ezmlm
Precedence: bulk
Received-SPF: error (nike.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: 
 <CALte62wS8GGP7dV+WM_xH1Lht6OQPs62THTAfWRFCy3eFhGiiQ@mail.gmail.com>
References: 
 <CAMv5TiCNDiZ8PAx97orKVv=KVD58DLYfbkCTbModRj075heE6w@mail.gmail.com>
 <CALte62wS8GGP7dV+WM_xH1Lht6OQPs62THTAfWRFCy3eFhGiiQ@mail.gmail.com>
From: Reynold Xin <rxin@databricks.com>
Date: Tue, 24 Mar 2015 21:20:27 -0700
Message-ID: 
 <CAPh_B=bnYghAOTnoHfst-ri+nB6_zpO+x3qP5=M3hVa+MTQu1A@mail.gmail.com>
Subject: Re: Spark SQL(1.3.0) "import sqlContext.implicits._" seems not work
 for converting a case class RDD to DataFrame
To: Ted Yu <yuzhihong@gmail.com>
Cc: Zhiwei Chan <z.w.chan.jason@gmail.com>,
 "dev@spark.apache.org" <dev@spark.apache.org>
Content-Type: multipart/alternative; boundary=001a1133984446407305121539ce

--001a1133984446407305121539ce
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

In particular:

http://spark.apache.org/docs/latest/sql-programming-guide.html


"Additionally, the implicit conversions now only augment RDDs that are
composed of Products (i.e., case classes or tuples) with a method toDF,
instead of applying automatically."


On Tue, Mar 24, 2015 at 9:07 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> Please take a look at:
> ./sql/core/src/main/scala/org/apache/spark/sql/DataFrameHolder.scala
> ./sql/core/src/main/scala/org/apache/spark/sql/GroupedData.scala
>
> Cheers
>
> On Tue, Mar 24, 2015 at 8:46 PM, Zhiwei Chan <z.w.chan.jason@gmail.com>
> wrote:
>
> > Hi all,
> >
> >   I just upgraded spark from 1.2.1 to 1.3.0, and changed the "import
> > sqlContext.createSchemaRDD" to "import sqlContext.implicits._" in my
> code.
> > (I scan the programming guide and it seems this is the only change I ne=
ed
> > to do). But it come to an error when run compile as following:
> > >>>
> > [ERROR] ...\magic.scala:527: error: value registerTempTable is not a
> member
> > of org.apache.spark.rdd.RDD[com.yhd.ycache.magic.Table]
> > [INFO]     tableRdd.registerTempTable(tableName)
> > <<<
> >
> > Then I try the exactly example in the programming guide of 1.3  in
> > spark-shell, it come to the same error.
> > >>>
> > scala> sys.env.get("CLASSPATH")
> > res7: Option[String] =3D
> >
> >
> Some(:/root/scala/spark-1.3.0-bin-hadoop2.4/conf:/root/scala/spark-1.3.0-=
bin-hadoop2.4/lib/spark-assembly-1.3.0-hadoop2.4.0.jar:/root/scala/spark-1.=
3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/root/scala/spark-1.3.0-b=
in-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar:/root/scala/spark-1.3.0-bin-ha=
doop2.4/lib/datanucleus-api-jdo-3.2.6.jar)
> >
> > scala>  val sqlContext =3D new org.apache.spark.sql.SQLContext(sc)
> > sqlContext: org.apache.spark.sql.SQLContext =3D
> > org.apache.spark.sql.SQLContext@4b05b3ff
> >
> > scala>  import sqlContext.implicits._
> > import sqlContext.implicits._
> >
> > scala>  case class Person(name: String, age: Int)
> > defined class Person
> >
> > scala>   val t1 =3D
> > sc.textFile("hdfs://heju:8020/user/root/magic/poolInfo.txt")
> > 15/03/25 11:13:35 INFO MemoryStore: ensureFreeSpace(81443) called with
> > curMem=3D186397, maxMem=3D278302556
> > 15/03/25 11:13:35 INFO MemoryStore: Block broadcast_3 stored as values =
in
> > memory (estimated size 79.5 KB, free 265.2 MB)
> > 15/03/25 11:13:35 INFO MemoryStore: ensureFreeSpace(31262) called with
> > curMem=3D267840, maxMem=3D278302556
> > 15/03/25 11:13:35 INFO MemoryStore: Block broadcast_3_piece0 stored as
> > bytes in memory (estimated size 30.5 KB, free 265.1 MB)
> > 15/03/25 11:13:35 INFO BlockManagerInfo: Added broadcast_3_piece0 in
> memory
> > on heju:48885 (size: 30.5 KB, free: 265.4 MB)
> > 15/03/25 11:13:35 INFO BlockManagerMaster: Updated info of block
> > broadcast_3_piece0
> > 15/03/25 11:13:35 INFO SparkContext: Created broadcast 3 from textFile =
at
> > <console>:34
> > t1: org.apache.spark.rdd.RDD[String] =3D
> > hdfs://heju:8020/user/root/magic/poolInfo.txt MapPartitionsRDD[9] at
> > textFile at <console>:34
> >
> > scala>  val t2 =3D t1.flatMap(_.split("\n")).map(_.split(" ")).map(p =
=3D>
> > Person(p(0),1))
> > t2: org.apache.spark.rdd.RDD[Person] =3D MapPartitionsRDD[12] at map at
> > <console>:38
> >
> > scala>  t2.registerTempTable("people")
> > <console>:41: error: value registerTempTable is not a member of
> > org.apache.spark.rdd.RDD[Person]
> >                t2.registerTempTable("people")
> >                   ^
> > <<<
> >
> > I found the following explanation in programming guide about implicit
> > convert case class to DataFrams, but I don't understand what I should d=
o.
> > Could any one tell me how should I do if I want to convert a case class
> RDD
> > to DataFrame?
> >
> > >>>
> > Isolation of Implicit Conversions and Removal of dsl Package (Scala-onl=
y)
> >
> > Many of the code examples prior to Spark 1.3 started with import
> > sqlContext._, which brought all of the functions from sqlContext into
> > scope. In Spark 1.3 we have isolated the implicit conversions for
> > converting RDDs into DataFrames into an object inside of the SQLContext=
.
> > Users should now write import sqlContext.implicits._.
> >
> > Additionally, the implicit conversions now only augment RDDs that are
> > composed of Products (i.e., case classes or tuples) with a method toDF,
> > instead of applying automatically.
> >
> > <<<
> > Thanks
> > Jason
> >
>

--001a1133984446407305121539ce--