hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Sichi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1376) Simple UDAFs with more than 1 parameter crash on empty row query
Date Tue, 01 Jun 2010 20:50:48 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874232#action_12874232

John Sichi commented on HIVE-1376:

Some more details on this:

* In the case of a full-table aggregation (no group by key) where no rows exist (or all get
filtered out), the aggregation framework sends a row of all nulls to the aggregator.  I don't
know why this is necessary, since all of the existing aggregators ignore the null anyway.

* Since the percentile UDAF uses a primitive double for the parameter type to the iterate
method (rather than a Double or a DoubleWritable), Java reflection throws an IllegalArgumentException
because it can't convert a null to a primitive.

There are three possible solutions:

(1) change percentile to use a non-primitive type

(2) add more reflection and skip the attempt to send the null to iterate in the case where
the parameter type is primitive

(3) avoid sending the null in the first place (unless someone can explain why it's needed,
or some regression test fails when we try it)

> Simple UDAFs with more than 1 parameter crash on empty row query 
> -----------------------------------------------------------------
>                 Key: HIVE-1376
>                 URL: https://issues.apache.org/jira/browse/HIVE-1376
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: Mayank Lahiri
>             Fix For: 0.6.0
> Simple UDAFs with more than 1 parameter crash when the query returns no rows. Currently,
this only seems to affect the percentile() UDAF where the second parameter is the percentile
to be computed (of type double). I've also verified the bug by adding a dummy parameter to
ExampleMin in contrib. 
> On an empty query, Hive seems to be trying to resolve an iterate() method with signature
{null,null} instead of {null,double}. You can reproduce this bug using:
> CREATE TABLE pct_test ( val INT );
> SELECT percentile(val, 0.5) FROM pct_test;
> which produces a lot of errors like: 
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method
public boolean org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator.iterate(org.apache.hadoop.io.LongWritable,double)
 on object org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator@11d13272 of
class org.apache.hadoop.hive.ql.udf.UDAFPercentile$PercentileLongEvaluator with arguments
{null, null} of size 2

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message