spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph Batchik (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-8007) Support resolving virtual columns in DataFrames
Date Fri, 17 Jul 2015 06:02:04 GMT

    [ https://issues.apache.org/jira/browse/SPARK-8007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630820#comment-14630820
] 

Joseph Batchik edited comment on SPARK-8007 at 7/17/15 6:01 AM:
----------------------------------------------------------------

Reynold, I start adding virtual columns to the DataFrames and SQL queries for SPARK-8003 and
SPARK-8007. My initial code is here: https://github.com/JDrit/spark/commit/e34d3a7eabbc9c41c2dd85b128b2bb5713039e40.

The one issue I ran into though was that the catalyst package cannot access org.apache.spark.sql.execution.expressions
where SparkPartitionID resides. For prototyping purposes I copied SparkPartitionID to the
catalyst package, but am wondering what would be the best way to deal with that dependency,
 

Can you let me know what you think about my changes and what else needs to be done to it.


was (Author: jd):
[~rxin] Reynold, I start adding virtual columns to the DataFrames and SQL queries for SPARK-8003
and SPARK-8007. My initial code is here: https://github.com/JDrit/spark/commit/e34d3a7eabbc9c41c2dd85b128b2bb5713039e40.

The one issue I ran into though was that the catalyst package cannot access org.apache.spark.sql.execution.expressions
where SparkPartitionID resides. For prototyping purposes I copied SparkPartitionID to the
catalyst package, but am wondering what would be the best way to deal with that dependency,
 

Can you let me know what you think about my changes and what else needs to be done to it.

> Support resolving virtual columns in DataFrames
> -----------------------------------------------
>
>                 Key: SPARK-8007
>                 URL: https://issues.apache.org/jira/browse/SPARK-8007
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Reynold Xin
>
> Create the infrastructure so we can resolve df("SPARK__PARTITION__ID") to SparkPartitionID
expression.
> A cool use case is to understand physical data skew:
> {code}
> df.groupBy("SPARK__PARTITION__ID").count()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message