spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [spark] maropu edited a comment on pull request #32424: [SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions
Date Wed, 12 May 2021 04:49:03 GMT

maropu edited a comment on pull request #32424:
URL: https://github.com/apache/spark/pull/32424#issuecomment-839430081


   > Why it's a problem only in scala API? how about SQL API?
   
   In SQL, since user-specified param names are used as they are, the same issue cannot happen;
   ```
   scala> val df = Seq((Seq(1,2,3), Seq("a", "b", "c"))).toDF("numbers", "letters")
   scala> df.selectExpr("""
        |     FLATTEN(
        |         TRANSFORM(
        |             numbers,
        |             number -> TRANSFORM(
        |                 letters,
        |                 letter -> (number AS number, letter AS letter)
        |             )
        |         )
        |     ) AS zipped
        | """).explain(true)
   
   == Analyzed Logical Plan ==
   zipped: array<struct<number:int,letter:string>>
   Project [flatten(transform(numbers#7, lambdafunction(transform(letters#8, lambdafunction(named_struct(number,
lambda number#14, letter, lambda letter#15), lambda letter#15, false)), lambda number#14,
false))) AS zipped#13]
                                                                                         
                                                                    ^^^^^^^^^^^^^^^^^^   
      ^^^^^^^^^^^^^^^
   +- Project [_1#2 AS numbers#7, _2#3 AS letters#8]
      +- LocalRelation [_1#2, _2#3]
   ```
   On the other hand, In DataFame APIs, the same param names (`x`, `y`, and `z`) were used
in lambda functions, so the name conflict could happen;
   ```
   scala> df.select(
        |     flatten(
        |         transform(
        |             $"numbers",
        |             (number: Column) => { transform(
        |                 $"letters",
        |                 (letter: Column) => { struct(
        |                     number.as("number"),
        |                     letter.as("letter")
        |                 ) }
        |             ) }
        |         )
        |     ).as("zipped")
        | ).explain(true)
   
   == Analyzed Logical Plan ==
   zipped: array<struct<number:int,letter:string>>
   Project [flatten(transform(numbers#7, lambdafunction(transform(letters#8, lambdafunction(struct(number,
lambda x_0#20, letter, lambda x_1#21), lambda x_1#21, false)), lambda x_0#20, false))) AS
zipped#19]
                                                                                         
                                                        ^^^^^^^^^^^^^^          ^^^^^^^^^^^^^^^
   +- Project [_1#2 AS numbers#7, _2#3 AS letters#8]
      +- LocalRelation [_1#2, _2#3]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message