spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Malak <michaelma...@yahoo.com.INVALID>
Subject Re: renaming SchemaRDD -> DataFrame
Date Mon, 26 Jan 2015 23:11:22 GMT
And in the off chance that anyone hasn't seen it yet, the Jan. 13 Bay Area Spark Meetup YouTube
contained a wealth of background information on this idea (mostly from Patrick and Reynold
:-).

https://www.youtube.com/watch?v=YWppYPWznSQ

________________________________
From: Patrick Wendell <pwendell@gmail.com>
To: Reynold Xin <rxin@databricks.com> 
Cc: "dev@spark.apache.org" <dev@spark.apache.org> 
Sent: Monday, January 26, 2015 4:01 PM
Subject: Re: renaming SchemaRDD -> DataFrame


One thing potentially not clear from this e-mail, there will be a 1:1
correspondence where you can get an RDD to/from a DataFrame.


On Mon, Jan 26, 2015 at 2:18 PM, Reynold Xin <rxin@databricks.com> wrote:
> Hi,
>
> We are considering renaming SchemaRDD -> DataFrame in 1.3, and wanted to
> get the community's opinion.
>
> The context is that SchemaRDD is becoming a common data format used for
> bringing data into Spark from external systems, and used for various
> components of Spark, e.g. MLlib's new pipeline API. We also expect more and
> more users to be programming directly against SchemaRDD API rather than the
> core RDD API. SchemaRDD, through its less commonly used DSL originally
> designed for writing test cases, always has the data-frame like API. In
> 1.3, we are redesigning the API to make the API usable for end users.
>
>
> There are two motivations for the renaming:
>
> 1. DataFrame seems to be a more self-evident name than SchemaRDD.
>
> 2. SchemaRDD/DataFrame is actually not going to be an RDD anymore (even
> though it would contain some RDD functions like map, flatMap, etc), and
> calling it Schema*RDD* while it is not an RDD is highly confusing. Instead.
> DataFrame.rdd will return the underlying RDD for all RDD methods.
>
>
> My understanding is that very few users program directly against the
> SchemaRDD API at the moment, because they are not well documented. However,
> oo maintain backward compatibility, we can create a type alias DataFrame
> that is still named SchemaRDD. This will maintain source compatibility for
> Scala. That said, we will have to update all existing materials to use
> DataFrame rather than SchemaRDD.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message