spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheng Lian <lian.cs....@gmail.com>
Subject Re: Spark 2.0 Dataset Documentation
Date Sat, 18 Jun 2016 05:44:58 GMT
As mentioned in the PR description, this is just an initial PR to bring 
existing contents up to date, so that people can add more contents 
incrementally.

We should definitely cover more about Dataset.


Cheng


On 6/17/16 10:28 PM, Pedro Rodriguez wrote:
> The updates look great!
>
> Looks like many places are updated to the new APIs, but there still 
> isn't a section for working with Datasets (most of the docs work with 
> Dataframes). Are you planning on adding more? I am thinking something 
> that would address common questions like the one I posted on the user 
> email list earlier today.
>
> Should I take discussion to your PR?
>
> Pedro
>
> On Fri, Jun 17, 2016 at 11:12 PM, Cheng Lian <lian.cs.zju@gmail.com 
> <mailto:lian.cs.zju@gmail.com>> wrote:
>
>     Hey Pedro,
>
>     SQL programming guide is being updated. Here's the PR, but not
>     merged yet: https://github.com/apache/spark/pull/13592
>
>     Cheng
>
>     On 6/17/16 9:13 PM, Pedro Rodriguez wrote:
>>     Hi All,
>>
>>     At my workplace we are starting to use Datasets in 1.6.1 and even
>>     more with Spark 2.0 in place of Dataframes. I looked at the 1.6.1
>>     documentation then the 2.0 documentation and it looks like not
>>     much time has been spent writing a Dataset guide/tutorial.
>>
>>     Preview Docs:
>>     https://home.apache.org/~pwendell/spark-releases/spark-2.0.0-preview-docs/sql-programming-guide.html#creating-datasets
>>     <https://home.apache.org/%7Epwendell/spark-releases/spark-2.0.0-preview-docs/sql-programming-guide.html#creating-datasets>
>>     Spark master docs:
>>     https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md
>>
>>
>>     I would like to spend the time to contribute an improvement to
>>     those docs with a more in depth examples of creating and using
>>     Datasets (eg using $ to select columns). Is this of value, and if
>>     so what should my next step be to get this going (create JIRA etc)?
>>
>>     -- 
>>     Pedro Rodriguez
>>     PhD Student in Distributed Machine Learning | CU Boulder
>>     R&D Data Science Intern at Oracle Data Cloud
>>     UC Berkeley AMPLab Alumni
>>
>>     ski.rodriguez@gmail.com <mailto:ski.rodriguez@gmail.com> |
>>     pedrorodriguez.io <http://pedrorodriguez.io> | 909-353-4423
>>     <tel:909-353-4423>
>>     Github: github.com/EntilZha <http://github.com/EntilZha> |
>>     LinkedIn: https://www.linkedin.com/in/pedrorodriguezscience
>>
>
>
>
>
> -- 
> Pedro Rodriguez
> PhD Student in Distributed Machine Learning | CU Boulder
> UC Berkeley AMPLab Alumni
>
> ski.rodriguez@gmail.com <mailto:ski.rodriguez@gmail.com> | 
> pedrorodriguez.io <http://pedrorodriguez.io> | 909-353-4423
> Github: github.com/EntilZha <http://github.com/EntilZha> | LinkedIn: 
> https://www.linkedin.com/in/pedrorodriguezscience
>


Mime
View raw message