hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mithun Radhakrishnan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-9377) UDF in_file() in WHERE predicate causes NPE.
Date Wed, 14 Jan 2015 20:52:34 GMT

    [ https://issues.apache.org/jira/browse/HIVE-9377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277650#comment-14277650
] 

Mithun Radhakrishnan commented on HIVE-9377:
--------------------------------------------

This problem arises from because of what happens after PredicatePushDown. {{OpProcFactory.FilterPPD}}
causes the Filter-operator to be replaced with a cloned instance, via {{FunctionRegistry.cloneGenericUDF()}}.
Here's an excerpt:

{code:title=FunctionRegistry.java|borderStyle=solid}
    if (clonedUDF != null) {
      // Copy info that may be required in the new copy.
      // The SettableUDF calls below could be replaced using this mechanism as well.
      try {
        genericUDF.copyToNewInstance(clonedUDF);
      } catch (UDFArgumentException err) {
        throw new IllegalArgumentException(err);
      }
...
{code}

{{GenericUDFInFile}} doesn't implement {{copyToNewInstance()}}, so the clone remains uninitialized.
When the next optimizer (after PPD, in this case the {{ConstantPropagate}}), we get an NPE.

The fix introduces an implementation for {{copyToNewInstance()}} that adopts/clones the members
of GenericUDFInFile. I think it's safe to adopt the underlying ObjectInspector members, since
the old Filter operator (and hence the Predicate/UDF) is discarded as part of PPD... but I'd
like confirmation.


> UDF in_file() in WHERE predicate causes NPE.
> --------------------------------------------
>
>                 Key: HIVE-9377
>                 URL: https://issues.apache.org/jira/browse/HIVE-9377
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF
>            Reporter: Mithun Radhakrishnan
>            Assignee: Mithun Radhakrishnan
>
> Consider the following query:
> {code:sql}
> SELECT foo, bar from mythdb.foobar where in_file( bar, '/tmp/bar_list.txt' );
> {code}
> Using {{in_file()}} in a WHERE predicate causes the following NPE:
> {noformat}
> java.lang.NullPointerException
> 	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getWritableConstantValue(ObjectInspectorUtils.java:1041)
> 	at org.apache.hadoop.hive.ql.udf.generic.GenericUDFInFile.getRequiredFiles(GenericUDFInFile.java:93)
> 	at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.isDeterministicUdf(ConstantPropagateProcFactory.java:303)
> 	at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:226)
> 	at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.access$000(ConstantPropagateProcFactory.java:92)
> 	at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory$ConstantPropagateFilterProc.process(ConstantPropagateProcFactory.java:623)
> 	at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> 	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
> 	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
> 	at org.apache.hadoop.hive.ql.optimizer.ConstantPropagate$ConstantPropagateWalker.walk(ConstantPropagate.java:147)
> 	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
> 	at org.apache.hadoop.hive.ql.optimizer.ConstantPropagate.transform(ConstantPropagate.java:117)
> 	at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:177)
> 	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10032)
> 	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:189)
> 	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224)
> 	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420)
> 	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306)
> 	at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108)
> 	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1156)
> 	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045)
> 	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035)
> 	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:206)
> 	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:158)
> 	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:369)
> 	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:304)
> 	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:701)
> 	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:674)
> 	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> {noformat}
> I have a tentative fix I need advice on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message