spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyukjin Kwon (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-21485) API Documentation for Spark SQL functions
Date Thu, 20 Jul 2017 09:45:01 GMT

    [ https://issues.apache.org/jira/browse/SPARK-21485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16094437#comment-16094437
] 

Hyukjin Kwon edited comment on SPARK-21485 at 7/20/17 9:44 AM:
---------------------------------------------------------------

I was thinking of adding a new SQL tap where we only have R/Python/Java/Scala for example,
here - https://spark.apache.org/docs/latest/api.html

Yes, I meant this one can be generated easily with the codes above (essentially 20-30 lines).
The final step above are like {{jekyll serve}}. {{mkdocs}} can also generate HTMLs from markdown
files and I was thinking adding the codes above into this documentation build.

The link, https://spark-test.github.io/sparksqldoc/ , is what I hosted manually to show my
idea but everything is exactly the same with the generated site from the codes as shown the
link.


was (Author: hyukjin.kwon):
I was thinking of adding a new SQL tap where we only have R/Python/Java/Scala for example,
here - https://spark.apache.org/docs/latest/api.html

Yes, I meant this one can be generated easily with the codes above (essentially 20-30 lines).
The final step above are like {{jekyll serve}}. {{mkdocs}} can also generate HTMLs from markdown
files and I was thinking adding the codes above into this documentation build.

The link, https://spark-test.github.io/sparksqldoc/ , is what I did separately to show my
idea but the contents are the same with the generated ones from the codes above.

> API Documentation for Spark SQL functions
> -----------------------------------------
>
>                 Key: SPARK-21485
>                 URL: https://issues.apache.org/jira/browse/SPARK-21485
>             Project: Spark
>          Issue Type: Documentation
>          Components: Documentation, SQL
>    Affects Versions: 2.3.0
>            Reporter: Hyukjin Kwon
>
> It looks we can generate the documentation from {{ExpressionDescription}} and {{ExpressionInfo}}
for Spark's SQL function documentation.
> I had some time to play with this so I just made a rough version - https://spark-test.github.io/sparksqldoc/
> Codes I used are as below :
> In {{pyspark}} shell:
> {code}
> from collections import namedtuple
> ExpressionInfo = namedtuple("ExpressionInfo", "className usage name extended")
> jinfos = spark.sparkContext._jvm.org.apache.spark.sql.api.python.PythonSQLUtils.listBuiltinFunctions()
> infos = []
> for jinfo in jinfos:
>     name = jinfo.getName()
>     usage = jinfo.getUsage()
>     usage = usage.replace("_FUNC_", name) if usage is not None else usage
>     extended = jinfo.getExtended()
>     extended = extended.replace("_FUNC_", name) if extended is not None else extended
>     infos.append(ExpressionInfo(
>         className=jinfo.getClassName(),
>         usage=usage,
>         name=name,
>         extended=extended))
> with open("index.md", 'w') as mdfile:
>     strip = lambda s: "\n".join(map(lambda u: u.strip(), s.split("\n")))
>     for info in sorted(infos, key=lambda i: i.name):
>         mdfile.write("### %s\n\n" % info.name)
>         if info.usage is not None:
>             mdfile.write("%s\n\n" % strip(info.usage))
>         if info.extended is not None:
>             mdfile.write("```%s```\n\n" % strip(info.extended))
> {code}
> This change had to be made first before running the codes above:
> {code:none}
> +++ b/sql/core/src/main/scala/org/apache/spark/sql/api/python/PythonSQLUtils.scala
> @@ -17,9 +17,15 @@
>  package org.apache.spark.sql.api.python
> +import org.apache.spark.sql.catalyst.analysis.FunctionRegistry
> +import org.apache.spark.sql.catalyst.expressions.ExpressionInfo
>  import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
>  import org.apache.spark.sql.types.DataType
>  private[sql] object PythonSQLUtils {
>    def parseDataType(typeText: String): DataType = CatalystSqlParser.parseDataType(typeText)
> +
> +  def listBuiltinFunctions(): Array[ExpressionInfo] = {
> +    FunctionRegistry.functionSet.flatMap(f => FunctionRegistry.builtin.lookupFunction(f)).toArray
> +  }
>  }
> {code}
> And then, I ran this:
> {code}
> mkdir docs
> echo "site_name: Spark SQL 2.3.0" >> mkdocs.yml
> echo "theme: readthedocs" >> mkdocs.yml
> mv index.md docs/index.md
> mkdocs serve
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message