drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5375) Nested loop join: return correct result for left join
Date Mon, 03 Apr 2017 12:15:42 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953374#comment-15953374
] 

ASF GitHub Bot commented on DRILL-5375:
---------------------------------------

Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/794#discussion_r109204353
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinBatch.java
---
    @@ -214,26 +226,62 @@ private boolean hasMore(IterOutcome outcome) {
     
       /**
        * Method generates the runtime code needed for NLJ. Other than the setup method to
set the input and output value
    -   * vector references we implement two more methods
    -   * 1. emitLeft()  -> Project record from the left side
    -   * 2. emitRight() -> Project record from the right side (which is a hyper container)
    +   * vector references we implement three more methods
    +   * 1. doEval() -> Evaluates if record from left side matches record from the right
side
    +   * 2. emitLeft() -> Project record from the left side
    +   * 3. emitRight() -> Project record from the right side (which is a hyper container)
        * @return the runtime generated class that implements the NestedLoopJoin interface
    -   * @throws IOException
    -   * @throws ClassTransformationException
        */
    -  private NestedLoopJoin setupWorker() throws IOException, ClassTransformationException
{
    -    final CodeGenerator<NestedLoopJoin> nLJCodeGenerator = CodeGenerator.get(NestedLoopJoin.TEMPLATE_DEFINITION,
context.getFunctionRegistry(), context.getOptions());
    +  private NestedLoopJoin setupWorker() throws IOException, ClassTransformationException,
SchemaChangeException {
    +    final CodeGenerator<NestedLoopJoin> nLJCodeGenerator = CodeGenerator.get(
    +        NestedLoopJoin.TEMPLATE_DEFINITION, context.getFunctionRegistry(), context.getOptions());
         nLJCodeGenerator.plainJavaCapable(true);
         // Uncomment out this line to debug the generated code.
     //    nLJCodeGenerator.saveCodeForDebugging(true);
         final ClassGenerator<NestedLoopJoin> nLJClassGenerator = nLJCodeGenerator.getRoot();
     
    +    // generate doEval
    +    final ErrorCollector collector = new ErrorCollectorImpl();
    +
    +
    +    /*
    +        Logical expression may contain fields from left and right batches. During code
generation (materialization)
    +        we need to indicate from which input field should be taken. Mapping sets can
work with only one input at a time.
    +        But non-equality expressions can be complex:
    +          select t1.c1, t2.c1, t2.c2 from t1 inner join t2 on t1.c1 between t2.c1 and
t2.c2
    +        or even contain self join which can not be transformed into filter since OR clause
is present
    +          select *from t1 inner join t2 on t1.c1 >= t2.c1 or t1.c3 <> t1.c4
    +
    +        In this case logical expression can not be split according to input presence
(like during equality joins
    --- End diff --
    
    Agree. I have updated the comment.


> Nested loop join: return correct result for left join
> -----------------------------------------------------
>
>                 Key: DRILL-5375
>                 URL: https://issues.apache.org/jira/browse/DRILL-5375
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>            Reporter: Arina Ielchiieva
>            Assignee: Arina Ielchiieva
>              Labels: doc-impacting
>
> Mini repro:
> 1. Create 2 Hive tables with data
> {code}
> CREATE TABLE t1 (
>   FYQ varchar(999),
>   dts varchar(999),
>   dte varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> 2016-Q1,2016-06-01,2016-09-30
> 2016-Q2,2016-09-01,2016-12-31
> 2016-Q3,2017-01-01,2017-03-31
> 2016-Q4,2017-04-01,2017-06-30
> CREATE TABLE t2 (
>   who varchar(999),
>   event varchar(999),
>   dt varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> aperson,did somthing,2017-01-06
> aperson,did somthing else,2017-01-12
> aperson,had chrsitmas,2016-12-26
> aperson,went wild,2016-01-01
> {code}
> 2. Impala Query shows correct result
> {code}
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> +------------+---------+---------+-------------------+
> | dt         | fyq     | who     | event             |
> +------------+---------+---------+-------------------+
> | 2016-01-01 | NULL    | aperson | went wild         |
> | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas     |
> | 2017-01-06 | 2016-Q3 | aperson | did somthing      |
> | 2017-01-12 | 2016-Q3 | aperson | did somthing else |
> +------------+---------+---------+-------------------+
> {code}
> 3. Drill query shows wrong results:
> {code}
> alter session set planner.enable_nljoin_for_scalar_only=false;
> use hive;
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> +-------------+----------+----------+--------------------+
> |     dt      |   fyq    |   who    |       event        |
> +-------------+----------+----------+--------------------+
> | 2016-12-26  | 2016-Q2  | aperson  | had chrsitmas      |
> | 2017-01-06  | 2016-Q3  | aperson  | did somthing       |
> | 2017-01-12  | 2016-Q3  | aperson  | did somthing else  |
> +-------------+----------+----------+--------------------+
> 3 rows selected (2.523 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message