spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [spark] HyukjinKwon commented on a change in pull request #26496: [WIP][SPARK-29748][PYTHON][SQL] Remove Row field sorting in PySpark
Date Mon, 02 Dec 2019 01:43:50 GMT
HyukjinKwon commented on a change in pull request #26496: [WIP][SPARK-29748][PYTHON][SQL] Remove
Row field sorting in PySpark
URL: https://github.com/apache/spark/pull/26496#discussion_r352393084
 
 

 ##########
 File path: python/pyspark/sql/types.py
 ##########
 @@ -1463,32 +1474,43 @@ class Row(tuple):
     Row(name='Alice', age=11)
 
     This form can also be used to create rows as tuple values, i.e. with unnamed
-    fields. Beware that such Row objects have different equality semantics:
+    fields. Row objects are evaluated for equality by data values in each
+    position, field names are not compared:
 
     >>> row1 = Row("Alice", 11)
     >>> row2 = Row(name="Alice", age=11)
     >>> row1 == row2
-    False
-    >>> row3 = Row(a="Alice", b=11)
-    >>> row1 == row3
     True
+    >>> row3 = Row(age=11, name="Alice")
+    >>> row2 == row3
+    False
     """
 
-    def __new__(self, *args, **kwargs):
+    def __new__(cls, *args, **kwargs):
+        if _legacy_row_enabled:
+            return _LegacyRow(args, kwargs)
         if args and kwargs:
             raise ValueError("Can not use both args "
                              "and kwargs to create Row")
+        if sys.version_info[:2] < (3, 6):
+            # Remove after Python < 3.6 dropped
+            from collections import OrderedDict
+            if kwargs:
+                raise ValueError("Named arguments are not allowed for Python version <
3.6, "
+                                 "use a collections.OrderedDict instead. To enable Spark
2.x "
+                                 "compatible Rows, set the environment variable "
+                                 "'PYSPARK_LEGACY_ROW_ENABLED' to 'true'.")
+            elif len(args) == 1 and isinstance(args[0], OrderedDict):
+                kwargs = args[0]
+
         if kwargs:
             # create row objects
-            names = sorted(kwargs.keys())
 
 Review comment:
   Actually, after a second thought, why don't we just have an env to switch on and off the
sorting, and disable it in Spark 3.0, and remove the env out in Spark 3.1? I think it will
need less changes I suspect (rather than having a separate class for legacy row)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message