spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Felix Cheung (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-23114) Spark R 2.3 QA umbrella
Date Sun, 21 Jan 2018 23:03:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-23114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333725#comment-16333725
] 

Felix Cheung edited comment on SPARK-23114 at 1/21/18 11:02 PM:
----------------------------------------------------------------

[~sameerag]

Here are some ideas for the release notes (that goes to spark-website in the announcements)

For SparkR, new in 2.3.0:

SQL changes:

SQL functions, cubing & nested structure

collect_list, collect_set, split_string, repeat_string, rollup, cube
 explode_outer posexplode_outer, %<=>%, !, not, create_array, create_map, grouping_bit,
grouping_id
 input_file_name, alias, trunc, date_trunc, map_keys, map_values, current_date, current_timestamp,
trim/trimString,
 dayofweek, unionByName,

to_json (map or array of maps)

Data Source -  multiLine (json/csv)

 

ML changes:

Decision Tree (regression and classification)

Constrained Logistic Regression
 offset in SparkR GLM [https://github.com/apache/spark/pull/18831]
 stringIndexerOrderType
 handleInvalid (spark.svmLinear, spark.logit, spark.mlp, spark.naiveBayes, spark.gbt, spark.decisionTree,
spark.randomForest)

 

SS changes:

Structured Streaming API for withWatermark, trigger (once, processingTime), partitionBy

stream-stream join

 

Documentation:

major overhaul and simplification of API doc for SQL functions

 


was (Author: felixcheung):
[~sameerag]

Here are some ideas for the release notes (that goes to spark-website in the announcements)

For SparkR, new in 2.3.0:

SQL changes:

SQL functions, cubing & nested structure

collect_list, collect_set, split_string, repeat_string, rollup, cube
 explode_outer posexplode_outer, %<=>%, !, not, create_array, create_map, grouping_bit,
grouping_id
 input_file_name, alias, trunc, date_trunc, map_keys, map_values, current_date, current_timestamp,
trim/trimString,
 dayofweek, unionByName,

to_json (map or array of maps)

Data Source -  multiLine (json/csv)

 

ML changes:

Decision Tree (regression and classification)

Constrained Logistic Regression
offset in SparkR GLM https://github.com/apache/spark/pull/18831
stringIndexerOrderType
handleInvalid (spark.svmLinear, spark.logit, spark.mlp, spark.naiveBayes, spark.gbt, spark.decisionTree,
spark.randomForest)

 

SS changes:

Structured Streaming API for withWatermark, trigger (once, processingTime), partitionBy

stream-stream join

 

Documentation:

major overhaul and simplification of API doc

 

> Spark R 2.3 QA umbrella
> -----------------------
>
>                 Key: SPARK-23114
>                 URL: https://issues.apache.org/jira/browse/SPARK-23114
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Documentation, SparkR
>            Reporter: Joseph K. Bradley
>            Assignee: Felix Cheung
>            Priority: Critical
>
> This JIRA lists tasks for the next Spark release's QA period for SparkR.
> The list below gives an overview of what is involved, and the corresponding JIRA issues
are linked below that.
> h2. API
> * Audit new public APIs (from the generated html doc)
> ** relative to Spark Scala/Java APIs
> ** relative to popular R libraries
> h2. Documentation and example code
> * For new algorithms, create JIRAs for updating the user guide sections & examples
> * Update Programming Guide
> * Update website



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message