hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naveen Gangam (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-9755) Hive built-in "ngram" UDAF fails when a mapper has no matches.
Date Mon, 23 Feb 2015 22:07:11 GMT

    [ https://issues.apache.org/jira/browse/HIVE-9755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333879#comment-14333879
] 

Naveen Gangam commented on HIVE-9755:
-------------------------------------

When a mapper returns an empty result set, the ngram UDAF has nothing to merge during the
reduce phase, merge(). The code
{code}
int n = Integer.parseInt(partialNGrams.get(partialNGrams.size()-1).toString());
if(myagg.n > 0 && myagg.n != n) {
        throw new HiveException(getClass().getSimpleName() + ": mismatch in value for 'n'"
            + ", which usually is caused by a non-constant expression. Found '"+n+"' and '"
            + myagg.n + "'.");
      }
{code}
In the code snippet above, the variables "n" and "myagg.n" refer to the same value (the n
in nGrams). This value gets added to end of the partial nGrams list generated by each mapper.
However, this value gets initialized during the map phase (iterate() method call). So if iterate()
is never called, when the mapper resultset is empty, this value is never initialized to the
"n" value from the query so defaults to java integer default of 0.

The merge() method currently checks for null partial objects
{code}
    public void merge(AggregationBuffer agg, Object partial) throws HiveException {
      if(partial == null) {
        return;
      }
{code}

Given the design, there is atleast one element is this partial buffer (the "n" value) so it
may never be null. The merge() should be a no-op if the value of "n" is ZERO.

I will upload a patch shortly.
 

> Hive built-in "ngram" UDAF fails when a mapper has no matches.
> --------------------------------------------------------------
>
>                 Key: HIVE-9755
>                 URL: https://issues.apache.org/jira/browse/HIVE-9755
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF
>    Affects Versions: 0.14.0
>            Reporter: Naveen Gangam
>            Assignee: Naveen Gangam
>            Priority: Critical
>
> hive> describe ngramtest;
> OK
> col1                	int                 	                    
> col3                	string              	                    
> Time taken: 0.192 seconds, Fetched: 2 row(s)
> SELECT explode(ngrams(sentences(lower(t.col3)), 3, 10)) as x FROM (SELECT col3  FROM
ngramtest WHERE col1=0) t;
> when any result has value equal null, returned the error. 
> 2015-01-08 09:15:00,262 FATAL ExecReducer: org.apache.hadoop.hive.ql.metadata.HiveException:
Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":["0","0","0","0"]},"alias":0}

> at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:258) 
> at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) 
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) 
> at org.apache.hadoop.mapred.Child$4.run(Child.java:268) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:396) 
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)

> at org.apache.hadoop.mapred.Child.main(Child.java:262) 
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: GenericUDAFnGramEvaluator:
mismatch in value for 'n', which usually is caused by a non-constant expression. Found '0'
and '1'. 
> at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFnGrams$GenericUDAFnGramEvaluator.merge(GenericUDAFnGrams.java:242)

> at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:142)

> at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:658)

> at org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:911)

> at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:753)

> at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:819)

> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474) 
> at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:249) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message