hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naveen Gangam (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-9755) Hive built-in "ngram" UDAF fails when a mapper has no matches.
Date Tue, 24 Feb 2015 04:16:12 GMT

     [ https://issues.apache.org/jira/browse/HIVE-9755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Naveen Gangam updated HIVE-9755:
--------------------------------
    Attachment: HIVE-9755.patch

The merge() method during the reduce phase of the ngram UDAF should be a NO-OP when the mapper
returns an empty set. The value of ZERO returned in the list (one and only one item) is an
indicator that the iterate() method was never called in that map job. So returning from merge()
with no action.

> Hive built-in "ngram" UDAF fails when a mapper has no matches.
> --------------------------------------------------------------
>
>                 Key: HIVE-9755
>                 URL: https://issues.apache.org/jira/browse/HIVE-9755
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF
>    Affects Versions: 0.14.0
>            Reporter: Naveen Gangam
>            Assignee: Naveen Gangam
>            Priority: Critical
>         Attachments: HIVE-9755.patch
>
>
> hive> describe ngramtest;
> OK
> col1                	int                 	                    
> col3                	string              	                    
> Time taken: 0.192 seconds, Fetched: 2 row(s)
> SELECT explode(ngrams(sentences(lower(t.col3)), 3, 10)) as x FROM (SELECT col3  FROM
ngramtest WHERE col1=0) t;
> when any result has value equal null, returned the error. 
> 2015-01-08 09:15:00,262 FATAL ExecReducer: org.apache.hadoop.hive.ql.metadata.HiveException:
Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":["0","0","0","0"]},"alias":0}

> at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:258) 
> at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) 
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) 
> at org.apache.hadoop.mapred.Child$4.run(Child.java:268) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:396) 
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)

> at org.apache.hadoop.mapred.Child.main(Child.java:262) 
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: GenericUDAFnGramEvaluator:
mismatch in value for 'n', which usually is caused by a non-constant expression. Found '0'
and '1'. 
> at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFnGrams$GenericUDAFnGramEvaluator.merge(GenericUDAFnGrams.java:242)

> at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:142)

> at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:658)

> at org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:911)

> at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:753)

> at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:819)

> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474) 
> at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:249) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message