spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Felix Cheung (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-16090) Improve method grouping in SparkR generated docs
Date Tue, 21 Jun 2016 21:08:58 GMT

    [ https://issues.apache.org/jira/browse/SPARK-16090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342696#comment-15342696
] 

Felix Cheung edited comment on SPARK-16090 at 6/21/16 9:08 PM:
---------------------------------------------------------------

This is for example the html output for gapply

{code}
# S4 method for signature 'GroupedData'
gapply(x, func, schema)

## S4 method for signature 'SparkDataFrame'
gapply(x, cols, func, schema)
</pre>


<h3>Arguments</h3>

<table summary="R argblock">
<tr valign="top"><td><code>x</code></td>
<td>
<p>a GroupedData</p>
</td></tr>
<tr valign="top"><td><code>func</code></td>
<td>
<p>A function to be applied to each group partition specified by GroupedData.
The function 'func' takes as argument a key - grouping columns and
a data frame - a local R data.frame.
The output of 'func' is a local R data.frame.</p>
</td></tr>
<tr valign="top"><td><code>schema</code></td>
<td>
<p>The schema of the resulting SparkDataFrame after the function is applied.
The schema must match to output of 'func'. It has to be defined for each
output column with preferred output column name and corresponding data type.</p>
</td></tr>
<tr valign="top"><td><code>cols</code></td>
<td>
<p>Grouping columns</p>
</td></tr>
<tr valign="top"><td><code>x</code></td>
<td>
<p>A SparkDataFrame</p>
</td></tr>
<tr valign="top"><td><code>func</code></td>
<td>
<p>A function to be applied to each group partition specified by grouping
column of the SparkDataFrame. The function 'func' takes as argument
a key - grouping columns and a data frame - a local R data.frame.
The output of 'func' is a local R data.frame.</p>
</td></tr>
<tr valign="top"><td><code>schema</code></td>
<td>
<p>The schema of the resulting SparkDataFrame after the function is applied.
The schema must match to output of 'func'. It has to be defined for each
output column with preferred output column name and corresponding data type.</p>
</td></tr>
</table>
{code}

As you can see, func and schema (and x) are listed twice with different wording under Arguments.
We should see if we could explain it one way and list them once only. (ie. one copy of "@param
func")



was (Author: felixcheung):
This is for example the html output for gapply

{code}
# S4 method for signature 'GroupedData'
gapply(x, func, schema)

## S4 method for signature 'SparkDataFrame'
gapply(x, cols, func, schema)
</pre>


<h3>Arguments</h3>

<table summary="R argblock">
<tr valign="top"><td><code>x</code></td>
<td>
<p>a GroupedData</p>
</td></tr>
<tr valign="top"><td><code>func</code></td>
<td>
<p>A function to be applied to each group partition specified by GroupedData.
The function 'func' takes as argument a key - grouping columns and
a data frame - a local R data.frame.
The output of 'func' is a local R data.frame.</p>
</td></tr>
<tr valign="top"><td><code>schema</code></td>
<td>
<p>The schema of the resulting SparkDataFrame after the function is applied.
The schema must match to output of 'func'. It has to be defined for each
output column with preferred output column name and corresponding data type.</p>
</td></tr>
<tr valign="top"><td><code>cols</code></td>
<td>
<p>Grouping columns</p>
</td></tr>
<tr valign="top"><td><code>x</code></td>
<td>
<p>A SparkDataFrame</p>
</td></tr>
<tr valign="top"><td><code>func</code></td>
<td>
<p>A function to be applied to each group partition specified by grouping
column of the SparkDataFrame. The function 'func' takes as argument
a key - grouping columns and a data frame - a local R data.frame.
The output of 'func' is a local R data.frame.</p>
</td></tr>
<tr valign="top"><td><code>schema</code></td>
<td>
<p>The schema of the resulting SparkDataFrame after the function is applied.
The schema must match to output of 'func'. It has to be defined for each
output column with preferred output column name and corresponding data type.</p>
</td></tr>
</table>
{code}

As you can see, func and schema are listed twice with different wording under Arguments.
We should see if we could explain it one way and list them once only. (ie. one copy of "@param
func")


> Improve method grouping in SparkR generated docs
> ------------------------------------------------
>
>                 Key: SPARK-16090
>                 URL: https://issues.apache.org/jira/browse/SPARK-16090
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Documentation, SparkR
>    Affects Versions: 2.0.0
>            Reporter: Xiangrui Meng
>            Priority: Critical
>
> This JIRA follows the discussion on https://github.com/apache/spark/pull/13109 to improve
method grouping in SparkR generated docs. Having one method per doc page is not an R convention.
However, having many methods per doc page would hurt the readability. So a proper grouping
would help. Since we use roxygen2 instead of writing Rd files directly, we should consider
smaller groups to avoid confusion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message