spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From r...@apache.org
Subject spark git commit: [SPARK-7060][SQL] Add alias function to python dataframe
Date Fri, 24 Apr 2015 01:52:59 GMT
Repository: spark
Updated Branches:
  refs/heads/master 336f7f537 -> 2d010f7af


[SPARK-7060][SQL] Add alias function to python dataframe

This pr tries to provide a way to let python users workaround https://issues.apache.org/jira/browse/SPARK-6231.

Author: Yin Huai <yhuai@databricks.com>

Closes #5634 from yhuai/pythonDFAlias and squashes the following commits:

8465acd [Yin Huai] Add an alias to a Python DF.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2d010f7a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2d010f7a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2d010f7a

Branch: refs/heads/master
Commit: 2d010f7afe6ac8e67e07da6bea700e9e8c9e6cc2
Parents: 336f7f5
Author: Yin Huai <yhuai@databricks.com>
Authored: Thu Apr 23 18:52:55 2015 -0700
Committer: Reynold Xin <rxin@databricks.com>
Committed: Thu Apr 23 18:52:55 2015 -0700

----------------------------------------------------------------------
 python/pyspark/sql/dataframe.py | 14 ++++++++++++++
 1 file changed, 14 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/2d010f7a/python/pyspark/sql/dataframe.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index c8c30ce..4759f5f 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -453,6 +453,20 @@ class DataFrame(object):
         return [f.name for f in self.schema.fields]
 
     @ignore_unicode_prefix
+    def alias(self, alias):
+        """Returns a new :class:`DataFrame` with an alias set.
+
+        >>> from pyspark.sql.functions import *
+        >>> df_as1 = df.alias("df_as1")
+        >>> df_as2 = df.alias("df_as2")
+        >>> joined_df = df_as1.join(df_as2, col("df_as1.name") == col("df_as2.name"),
'inner')
+        >>> joined_df.select(col("df_as1.name"), col("df_as2.name"), col("df_as2.age")).collect()
+        [Row(name=u'Alice', name=u'Alice', age=2), Row(name=u'Bob', name=u'Bob', age=5)]
+        """
+        assert isinstance(alias, basestring), "alias should be a string"
+        return DataFrame(getattr(self._jdf, "as")(alias), self.sql_ctx)
+
+    @ignore_unicode_prefix
     def join(self, other, joinExprs=None, joinType=None):
         """Joins with another :class:`DataFrame`, using the given join expression.
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message