spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From r...@apache.org
Subject spark git commit: [SPARK-7035] Encourage __getitem__ over __getattr__ on column access in the Python DataFrame API
Date Thu, 07 May 2015 08:02:10 GMT
Repository: spark
Updated Branches:
  refs/heads/branch-1.4 703211b97 -> b929a7580


[SPARK-7035] Encourage __getitem__ over __getattr__ on column access in the Python DataFrame
API

Author: ksonj <kson@siberie.de>

Closes #5971 from ksonj/doc and squashes the following commits:

dadfebb [ksonj] __getitem__ is cleaner than __getattr__

(cherry picked from commit fae4e2d6094de57a438ee4188ce47fc5b01b96fe)
Signed-off-by: Reynold Xin <rxin@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b929a758
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b929a758
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b929a758

Branch: refs/heads/branch-1.4
Commit: b929a75800c7c33e017d9a2afcd9c5956c9c6e69
Parents: 703211b
Author: ksonj <kson@siberie.de>
Authored: Thu May 7 01:02:00 2015 -0700
Committer: Reynold Xin <rxin@databricks.com>
Committed: Thu May 7 01:02:08 2015 -0700

----------------------------------------------------------------------
 docs/sql-programming-guide.md | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/b929a758/docs/sql-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index b8233ae..df4c123 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -139,7 +139,6 @@ DataFrames provide a domain-specific language for structured data manipulation
i
 
 Here we include some basic examples of structured data processing using DataFrames:
 
-
 <div class="codetabs">
 <div data-lang="scala"  markdown="1">
 {% highlight scala %}
@@ -242,6 +241,12 @@ df.groupBy("age").count().show();
 </div>
 
 <div data-lang="python"  markdown="1">
+In Python it's possible to access a DataFrame's columns either by attribute
+(`df.age`) or by indexing (`df['age']`). While the former is convenient for
+interactive data exploration, users are highly encouraged to use the
+latter form, which is future proof and won't break with column names that
+are also attributes on the DataFrame class.
+
 {% highlight python %}
 from pyspark.sql import SQLContext
 sqlContext = SQLContext(sc)
@@ -270,14 +275,14 @@ df.select("name").show()
 ## Justin
 
 # Select everybody, but increment the age by 1
-df.select(df.name, df.age + 1).show()
+df.select(df['name'], df['age'] + 1).show()
 ## name    (age + 1)
 ## Michael null
 ## Andy    31
 ## Justin  20
 
 # Select people older than 21
-df.filter(df.age > 21).show()
+df.filter(df['age'] > 21).show()
 ## age name
 ## 30  Andy
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message