drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4963) Issues when overloading Drill native functions with dynamic UDFs
Date Thu, 02 Mar 2017 18:51:45 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892779#comment-15892779
] 

ASF GitHub Bot commented on DRILL-4963:
---------------------------------------

Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/701
  
    @jinfengni , 
    
    As it turns out, we do have a comprehensive design for the original feature and the MVCC
revision. The key goals are that a function, once registered, is guaranteed to be available
on all Drillbits once it is visible to any particular Foreman. Without this guarantee of consistency,
DUDFs become non-determinstic and will cause customer problems.
    
    We do have a "refresh" operation: registering a DUDF updates ZK which sends updates to
each node. The problem is the race condition. I register a UDF foo() on node A. I run a query
from that same node. If my query happens to hit node B before the ZK notification, the query
will fail. Our goal is that such failure cannot happen, hence the need for a "pull" model
to augment the ZK-based "push" model.
    
    A manual "update" would have the same issue unless we synchronized the update across all
nodes. Also, the only way to ensure that DUDFs are available is to issue an update after adding
each DUDF. But, if we did that, we might as well make the DUDF registration itself synchronous
across all nodes.
    
    And, of course, the node synchronization does not handle the race condition in which a
new node comes up right after a synchronization starts. We'd have to ensure that the new node
reads the proper state from ZK. We can do that if we first update ZK, then do synchronization
to all nodes, then update ZK with the fact that all nodes are aware of the DUDF. 
    
    Without the "two-phase" process, our new node can come up, learn of the new DUDF and issue
a query using the DUDF without some nodes having been notified of the synchronization.
    
    Overall, this is a difficult area. Relying on the well-known semantics of MVCC makes the
problems much easier to solve.
    
    So, the question here is whether it is worth checking in this partial solution for 1.10,
or just leave the problem open until a complete solution is available.


> Issues when overloading Drill native functions with dynamic UDFs
> ----------------------------------------------------------------
>
>                 Key: DRILL-4963
>                 URL: https://issues.apache.org/jira/browse/DRILL-4963
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 1.9.0
>            Reporter: Roman
>            Assignee: Arina Ielchiieva
>              Labels: ready-to-commit
>             Fix For: Future
>
>         Attachments: subquery_udf-1.0.jar, subquery_udf-1.0-sources.jar, test_overloading-1.0.jar,
test_overloading-1.0-sources.jar
>
>
> I created jar file which overloads 3 DRILL native functions (LOG(VARCHAR-REQUIRED), CURRENT_DATE(VARCHAR-REQUIRED)
and ABS(VARCHAR-REQUIRED,VARCHAR-REQUIRED)) and registered it as dynamic UDF.
> If I try to use my functions I will get errors:
> {code:xml}
> SELECT CURRENT_DATE('test') FROM (VALUES(1));
> {code}
> Error: FUNCTION ERROR: CURRENT_DATE does not support operand types (CHAR)
> SQL Query null
> {code:xml}
> SELECT ABS('test','test') FROM (VALUES(1));
> {code}
> Error: FUNCTION ERROR: ABS does not support operand types (CHAR,CHAR)
> SQL Query null
> {code:xml}
> SELECT LOG('test') FROM (VALUES(1));
> {code}
> Error: SYSTEM ERROR: DrillRuntimeException: Failure while materializing expression in
constant expression evaluator LOG('test').  Errors: 
> Error in expression at index -1.  Error: Missing function implementation: castTINYINT(VARCHAR-REQUIRED).
 Full expression: UNKNOWN EXPRESSION.
> But if I rerun all this queries after "DrillRuntimeException", they will run correctly.
It seems that Drill have not updated the function signature before that error. Also if I add
jar as usual UDF (copy jar to /drill_home/jars/3rdparty and restart drillbits), all queries
will run correctly without errors.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message