Return-Path: X-Original-To: apmail-drill-issues-archive@minotaur.apache.org Delivered-To: apmail-drill-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B781B17337 for ; Wed, 18 Mar 2015 02:29:38 +0000 (UTC) Received: (qmail 96603 invoked by uid 500); 18 Mar 2015 02:29:38 -0000 Delivered-To: apmail-drill-issues-archive@drill.apache.org Received: (qmail 96573 invoked by uid 500); 18 Mar 2015 02:29:38 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 96563 invoked by uid 99); 18 Mar 2015 02:29:38 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Mar 2015 02:29:38 +0000 Date: Wed, 18 Mar 2015 02:29:38 +0000 (UTC) From: "Aman Sinha (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (DRILL-2488) Wrong result on join between two subqueries with aggregation MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Sinha updated DRILL-2488: ------------------------------ Attachment: 0001-DRILL-2488-Return-DEFAULT-as-supported-encoding-for-.patch It turns out to be an issue with supported encoding for MergeJoin. Merge Join execution operator currently does not process incoming batches with SV2 or SV4, so if there was a Limit or Sort below, we need to insert a SelectionVectorRemover below the MJ. Uploaded a patch with a simple fix. [~vkorukanti] could you please review ? I haven't run all tests yet..still in process. > Wrong result on join between two subqueries with aggregation > ------------------------------------------------------------ > > Key: DRILL-2488 > URL: https://issues.apache.org/jira/browse/DRILL-2488 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators > Affects Versions: 0.8.0 > Reporter: Victoria Markman > Assignee: Aman Sinha > Priority: Critical > Fix For: 0.9.0 > > Attachments: 0001-DRILL-2488-Return-DEFAULT-as-supported-encoding-for-.patch, t1.parquet > > > {code} > 0: jdbc:drill:schema=dfs> select * from t1; > +------------+------------+------------+ > | a1 | b1 | c1 | > +------------+------------+------------+ > | 1 | aaaaa | 2015-01-01 | > | 2 | bbbbb | 2015-01-02 | > | 3 | ccccc | 2015-01-03 | > | 4 | null | 2015-01-04 | > | 5 | eeeee | 2015-01-05 | > | 6 | fffff | 2015-01-06 | > | 7 | ggggg | 2015-01-07 | > | null | hhhhh | 2015-01-08 | > | 9 | iiiii | null | > | 10 | jjjjj | 2015-01-10 | > +------------+------------+------------+ > 10 rows selected (0.15 seconds) > {code} > This result is incorrect, one row is missing > {code} > 0: jdbc:drill:schema=dfs> select * from > . . . . . . . . . . . . > ( > . . . . . . . . . . . . > select > . . . . . . . . . . . . > b1, > . . . . . . . . . . . . > count(distinct a1) > . . . . . . . . . . . . > from > . . . . . . . . . . . . > t1 > . . . . . . . . . . . . > group by > . . . . . . . . . . . . > b1 > . . . . . . . . . . . . > order by > . . . . . . . . . . . . > b1 limit 5 offset 1 > . . . . . . . . . . . . > ) as sq1(x1, y1) > . . . . . . . . . . . . > > . . . . . . . . . . . . > inner join > . . . . . . . . . . . . > > . . . . . . . . . . . . > ( > . . . . . . . . . . . . > select > . . . . . . . . . . . . > b1, > . . . . . . . . . . . . > count(distinct a1) > . . . . . . . . . . . . > from > . . . . . . . . . . . . > t1 > . . . . . . . . . . . . > group by > . . . . . . . . . . . . > b1 > . . . . . . . . . . . . > order by > . . . . . . . . . . . . > b1 limit 5 offset 1 > . . . . . . . . . . . . > ) as sq2(x1, y1) > . . . . . . . . . . . . > on > . . . . . . . . . . . . > sq1.x1 = sq2.x1 and > . . . . . . . . . . . . > sq2.y1 = sq2.y1 > . . . . . . . . . . . . > ; > +------------+------------+------------+------------+ > | x1 | y1 | x10 | y10 | > +------------+------------+------------+------------+ > | bbbbb | 1 | bbbbb | 1 | > | ccccc | 1 | ccccc | 1 | > | eeeee | 1 | eeeee | 1 | > | fffff | 1 | fffff | 1 | > +------------+------------+------------+------------+ > 4 rows selected (0.28 seconds) > {code} > Explain plan for the wrong result: > {code} > 00-01 Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) > 00-02 Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3]) > 00-03 MergeJoin(condition=[=($0, $2)], joinType=[inner]) > 00-05 Limit(offset=[1], fetch=[5]) > 00-07 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) > 00-09 Sort(sort0=[$0], dir0=[ASC]) > 00-11 StreamAgg(group=[{0, 1}]) > 00-13 Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) > 00-15 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]]) > 00-04 Project(b10=[$0], EXPR$10=[$1]) > 00-06 SelectionVectorRemover > 00-08 Sort(sort0=[$0], dir0=[ASC]) > 00-10 Filter(condition=[=($1, $1)]) > 00-12 Limit(offset=[1], fetch=[5]) > 00-14 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)]) > 00-16 Sort(sort0=[$0], dir0=[ASC]) > 00-17 StreamAgg(group=[{0, 1}]) > 00-18 Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) > 00-19 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]]) > {code} > If you turn off merge join, query returns correct result: > {code} > 0: jdbc:drill:schema=dfs> select * from > . . . . . . . . . . . . > ( > . . . . . . . . . . . . > select > . . . . . . . . . . . . > b1, > . . . . . . . . . . . . > count(distinct a1) > . . . . . . . . . . . . > from > . . . . . . . . . . . . > t1 > . . . . . . . . . . . . > group by > . . . . . . . . . . . . > b1 > . . . . . . . . . . . . > order by > . . . . . . . . . . . . > b1 limit 5 offset 1 > . . . . . . . . . . . . > ) as sq1(x1, y1) > . . . . . . . . . . . . > > . . . . . . . . . . . . > inner join > . . . . . . . . . . . . > ( > . . . . . . . . . . . . > select > . . . . . . . . . . . . > b1, > . . . . . . . . . . . . > count(distinct a1) > . . . . . . . . . . . . > from > . . . . . . . . . . . . > t1 > . . . . . . . . . . . . > group by > . . . . . . . . . . . . > b1 > . . . . . . . . . . . . > order by > . . . . . . . . . . . . > b1 limit 5 offset 1 > . . . . . . . . . . . . > ) as sq2(x1, y1) > . . . . . . . . . . . . > on > . . . . . . . . . . . . > sq1.x1 = sq2.x1 and > . . . . . . . . . . . . > sq2.y1 = sq2.y1 > . . . . . . . . . . . . > ; > +------------+------------+------------+------------+ > | x1 | y1 | x10 | y10 | > +------------+------------+------------+------------+ > | bbbbb | 1 | bbbbb | 1 | > | ccccc | 1 | ccccc | 1 | > | eeeee | 1 | eeeee | 1 | > | fffff | 1 | fffff | 1 | > | ggggg | 1 | ggggg | 1 | > +------------+------------+------------+------------+ > 5 rows selected (0.352 seconds) > {code} > cut/paste reproduction > {code:sql} > select * from > ( > select > b1, > count(distinct a1) > from > t1 > group by > b1 > order by > b1 limit 5 offset 1 > ) as sq1(x1, y1) > > inner join > > ( > select > b1, > count(distinct a1) > from > t1 > group by > b1 > order by > b1 limit 5 offset 1 > ) as sq2(x1, y1) > on > sq1.x1 = sq2.x1 and > sq2.y1 = sq2.y1 > ; > {code} > This test has been running and passing in regression test suite until framework was switched to a new code, where JSON parsing was replaced with jackson and for a brief period ( I believe Friday afternoon until now ) this suite was not executed. > We already have a merge join bug DRILL-2010, but this one seems to be different (small data set) and feels like a recent regression. > For QA: test Functional/Passing/aggregation/sanity/q18.sql will be running with merge join disabled until this issue is fixed. Will need to remove alter session from this file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)