spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Parth Gandhi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-24935) Problem with Executing Hive UDF's from Spark 2.2 Onwards
Date Sun, 29 Jul 2018 20:57:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-24935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16561266#comment-16561266
] 

Parth Gandhi commented on SPARK-24935:
--------------------------------------

[~cloud_fan] I too had the same doubt that maybe hive UDAF might still have issues supporting
partial aggregation completely though I am not so sure. Would it make sense to add support
for complete aggregation mode to ensure backward compatibility? Thank you.

> Problem with Executing Hive UDF's from Spark 2.2 Onwards
> --------------------------------------------------------
>
>                 Key: SPARK-24935
>                 URL: https://issues.apache.org/jira/browse/SPARK-24935
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0, 2.3.1
>            Reporter: Parth Gandhi
>            Priority: Major
>
> A user of sketches library(https://github.com/DataSketches/sketches-hive) reported an
issue with HLL Sketch Hive UDAF that seems to be a bug in Spark or Hive. Their code runs fine
in 2.1 but has an issue from 2.2 onwards. For more details on the issue, you can refer to
the discussion in the sketches-user list:
> [https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/sketches-user/GmH4-OlHP9g/MW-J7Hg4BwAJ]
>  
> On further debugging, we figured out that from 2.2 onwards, Spark hive UDAF provides
support for partial aggregation, and has removed the functionality that supported complete
mode aggregation(Refer https://issues.apache.org/jira/browse/SPARK-19060 and https://issues.apache.org/jira/browse/SPARK-18186).
Thus, instead of expecting update method to be called, merge method is called here ([https://github.com/DataSketches/sketches-hive/blob/master/src/main/java/com/yahoo/sketches/hive/hll/SketchEvaluator.java#L56)] which
throws the exception as described in the forums above.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message