Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 73BBB200B8D for ; Fri, 9 Sep 2016 00:18:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 725E1160AD5; Thu, 8 Sep 2016 22:18:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 71FAC160AAD for ; Fri, 9 Sep 2016 00:18:21 +0200 (CEST) Received: (qmail 65648 invoked by uid 500); 8 Sep 2016 22:18:20 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 65638 invoked by uid 99); 8 Sep 2016 22:18:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Sep 2016 22:18:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 831922C014F for ; Thu, 8 Sep 2016 22:18:20 +0000 (UTC) Date: Thu, 8 Sep 2016 22:18:20 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (DRILL-4877) max(dir0), max(dir1) query against parquet data slower by 2X MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 08 Sep 2016 22:18:22 -0000 [ https://issues.apache.org/jira/browse/DRILL-4877?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1547= 5188#comment-15475188 ]=20 ASF GitHub Bot commented on DRILL-4877: --------------------------------------- Github user jinfengni commented on the issue: https://github.com/apache/drill/pull/583 =20 +1 =20 > max(dir0), max(dir1) query against parquet data slower by 2X > ------------------------------------------------------------ > > Key: DRILL-4877 > URL: https://issues.apache.org/jira/browse/DRILL-4877 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization > Affects Versions: 1.9.0 > Environment: 4 node cluster centos > Reporter: Khurram Faraaz > Assignee: Aman Sinha > Priority: Critical > > max(dir0), max(dir1) query against parquet data slower by 2X > test was run with meta data cache on both 1.7.0 and 1.9.0 > there is a difference in query plan and also execution time on 1.9.0 is c= lose to 2X that on 1.7.0=20 > Test from Drill 1.9.0 git commit id: 28d315bb > on 4 node Centos cluster > {noformat} > 0: jdbc:drill:schema=3Ddfs.tmp> select max(dir0), max(dir1), max(dir2) fr= om `DRILL_4589`; > +---------+---------+---------+ > | EXPR$0 | EXPR$1 | EXPR$2 | > +---------+---------+---------+ > | 2015 | Q4 | null | > +---------+---------+---------+ > 1 row selected (70.644 seconds) > {noformat} > Query plan for the above query, note than in Drill 1.9.0 usedMetadataFile= is not available is the query plan text. > {noformat} > 0: jdbc:drill:schema=3Ddfs.tmp> explain plan for select max(dir0), max(di= r1), max(dir2) from `DRILL_4589`; > +------+------+ > | text | json | > +------+------+ > | 00-00 Screen > 00-01 Project(EXPR$0=3D[$0], EXPR$1=3D[$1], EXPR$2=3D[$2]) > 00-02 StreamAgg(group=3D[{}], EXPR$0=3D[MAX($0)], EXPR$1=3D[MAX($1= )], EXPR$2=3D[MAX($2)]) > 00-03 UnionExchange > 01-01 StreamAgg(group=3D[{}], EXPR$0=3D[MAX($0)], EXPR$1=3D[MA= X($1)], EXPR$2=3D[MAX($2)]) > 01-02 Scan(groupscan=3D[ParquetGroupScan [entries=3D[ReadEnt= ryWithPath [path=3D/tmp/DRILL_4589/1990/Q1/f672.parquet], ReadEntryWithPath= [path=3D/tmp/DRILL_4589/2011/Q4/f162.parquet], ReadEntryWithPath [path=3D/= tmp/DRILL_4589/2000/Q2/f1101.parquet], ReadEntryWithPath [path=3D/tmp/DRILL= _4589/1996/Q2/f110.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2006= /Q3/f1192.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/1999/Q2/f174.= parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2006/Q4/f885.parquet], = ReadEntryWithPath [path=3D/tmp/DRILL_4589/2001/Q3/f1720.parquet], ReadEntry= WithPath [path=3D/tmp/DRILL_4589/2001/Q1/f1779.parquet], ReadEntryWithPath = [path=3D/tmp/DRILL_4589/1991/Q2/f629.parquet], ReadEntryWithPath [path=3D/t= mp/DRILL_4589/2003/Q4/f821.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4= 589/2015/Q3/f896.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2002/Q= 2/f1458.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2004/Q4/f1756.p= arquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2001/Q2/f1490.parquet], = ReadEntryWithPath [path=3D/tmp/DRILL_4589/2003/Q3/f1137.parquet], ReadEntry= WithPath [path=3D/tmp/DRILL_4589/2013/Q1/f561.parquet], ReadEntryWithPath [= path=3D/tmp/DRILL_4589/1990/Q3/f1562.parquet], ReadEntryWithPath [path=3D/t= mp/DRILL_4589/2003/Q1/f1445.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_= 4589/2006/Q1/f236.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/1992/= Q4/f1209.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2014/Q2/f518.p= arquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/1993/Q4/f1598.parquet], = ReadEntryWithPath [path=3D/tmp/DRILL_4589/2008/Q1/f780.parquet], ReadEntryW= ithPath [path=3D/tmp/DRILL_4589/1999/Q1/f1763.parquet], ReadEntryWithPath [= path=3D/tmp/DRILL_4589/1990/Q4/f381.parquet], ReadEntryWithPath [path=3D/tm= p/DRILL_4589/1990/Q1/f1870.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4= 589/2014/Q1/f915.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2001/Q= 2/f673.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/1998/Q1/f736.par= quet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2013/Q2/f749.parquet], Rea= dEntryWithPath [path=3D/tmp/DRILL_4589/2007/Q3/f111.parquet], ReadEntryWith= Path [path=3D/tmp/DRILL_4589/1993/Q3/f776.parquet], ReadEntryWithPath [path= =3D/tmp/DRILL_4589/2002/Q1/f403.parquet], ReadEntryWithPath [path=3D/tmp/DR= ILL_4589/2005/Q2/f904.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2= 000/Q4/f944.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/1994/Q2/f50= 6.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/1994/Q4/f612.parquet]= , ReadEntryWithPath [path=3D/tmp/DRILL_4589/1991/Q1/f1838.parquet], ReadEnt= ryWithPath [path=3D/tmp/DRILL_4589/2012/Q2/f1764.parquet], ReadEntryWithPat= h [path=3D/tmp/DRILL_4589/2010/Q1/f684.parquet], ReadEntryWithPath [path=3D= /tmp/DRILL_4589/2005/Q4/f176.parquet], ReadEntryWithPath [path=3D/tmp/DRILL= _4589/1991/Q4/f150.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2012= /Q3/f832.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/1997/Q1/f967.p= arquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2000/Q4/f1733.parquet], = ReadEntryWithPath [path=3D/tmp/DRILL_4589/2008/Q2/f383.parquet], ReadEntryW= ithPath [path=3D/tmp/DRILL_4589/1995/Q2/f1572.parquet], ReadEntryWithPath [= path=3D/tmp/DRILL_4589/1991/Q4/f1241.parquet], ReadEntryWithPath [path=3D/t= mp/DRILL_4589/1996/Q4/f1111.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_= 4589/2005/Q2/f1911.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/1998= /Q4/f1468.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2014/Q4/f1122= .parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2013/Q2/f1147.parquet]= , ReadEntryWithPath [path=3D/tmp/DRILL_4589/2015/Q4/f1445.parquet], ReadEnt= ryWithPath [path=3D/tmp/DRILL_4589/2006/Q1/f1649.parquet], ReadEntryWithPat= h [path=3D/tmp/DRILL_4589/2005/Q1/f1615.parquet], ReadEntryWithPath [path= =3D/tmp/DRILL_4589/2008/Q3/f1947.parquet], ReadEntryWithPath [path=3D/tmp/D= RILL_4589/2007/Q3/f1913.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589= /1995/Q3/f1432.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2015/Q2/= f353.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2000/Q2/f838.parqu= et], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2011/Q1/f1145.parquet], Read= EntryWithPath [path=3D/tmp/DRILL_4589/2010/Q1/f1111.parquet], ReadEntryWith= Path [path=3D/tmp/DRILL_4589/2013/Q3/f1443.parquet], ReadEntryWithPath [pat= h=3D/tmp/DRILL_4589/1997/Q4/f676.parquet], ReadEntryWithPath [path=3D/tmp/D= RILL_4589/2011/Q4/f89.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/1= 994/Q1/f1893.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2015/Q1/f1= 168.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2014/Q1/f1134.parqu= et], ReadEntryWithPath [path=3D/tmp/DRILL_4589/1991/Q1/f441.parquet], ReadE= ntryWithPath [path=3D/tmp/DRILL_4589/2004/Q3/f1924.parquet], ReadEntryWithP= ath [path=3D/tmp/DRILL_4589/1995/Q2/f341.parquet], ReadEntryWithPath [path= =3D/tmp/DRILL_4589/2014/Q2/f1430.parquet], ReadEntryWithPath [path=3D/tmp/D= RILL_4589/2003/Q3/f969.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/= 1996/Q1/f1123.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/1997/Q1/f= 1157.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/1999/Q3/f1455.parq= uet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/1998/Q3/f1421.parquet], Rea= dEntryWithPath [path=3D/tmp/DRILL_4589/2007/Q4/f654.parquet], ReadEntryWith= Path [path=3D/tmp/DRILL_4589/1999/Q2/f1159.parquet], ReadEntryWithPath [pat= h=3D/tmp/DRILL_4589/2014/Q4/f624.parquet], ReadEntryWithPath [path=3D/tmp/D= RILL_4589/2010/Q2/f287.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/= 1992/Q2/f1583.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/1994/Q4/f= 1881.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2007/Q3/f18.parque= t], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2000/Q4/f59.parquet], ReadEnt= ryWithPath [path=3D/tmp/DRILL_4589/2004/Q3/f738.parquet], ReadEntryWithPath= [path=3D/tmp/DRILL_4589/1996/Q4/f1298.parquet], ReadEntryWithPath [path=3D= /tmp/DRILL_4589/1995/Q1/f1740.parquet], ReadEntryWithPath [path=3D/tmp/DRIL= L_4589/1995/Q4/f1264.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/20= 02/Q4/f299.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2005/Q1/f467= .parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2013/Q1/f1751.parquet]= , ReadEntryWithPath [path=3D/tmp/DRILL_4589/1993/Q3/f1262.parquet], ReadEnt= ryWithPath [path=3D/tmp/DRILL_4589/1992/Q1/f1287.parquet], ReadEntryWithPat= h [path=3D/tmp/DRILL_4589/2007/Q1/f945.parquet], ReadEntryWithPath [path=3D= /tmp/DRILL_4589/2012/Q1/f87.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_= 4589/1994/Q3/f1585.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/1999= /Q4/f214.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/1996/Q1/f258.p= arquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2002/Q4/f112.parquet], R= eadEntryWithPath [path=3D/tmp/DRILL_4589/1997/Q2/f1742.parquet], ReadEntryW= ithPath [path=3D/tmp/DRILL_4589/1998/Q2/f1776.parquet], ReadEntryWithPath [= path=3D/tmp/DRILL_4589/2005/Q4/f1603.parquet], ReadEntryWithPath [path=3D/t= mp/DRILL_4589/2004/Q2/f550.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4= 589/2014/Q2/f584.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2015/Q= 2/f1753.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/1990/Q3/f712.pa= rquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2005/Q3/f507.parquet], Re= adEntryWithPath [path=3D/tmp/DRILL_4589/2004/Q1/f698.parquet], ReadEntryWit= hPath [path=3D/tmp/DRILL_4589/1998/Q4/f445.parquet], ReadEntryWithPath [pat= h=3D/tmp/DRILL_4589/1997/Q3/f422.parquet], ReadEntryWithPath [path=3D/tmp/D= RILL_4589/2013/Q4/f204.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/= 2003/Q4/f1433.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/1991/Q2/f= 695.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2000/Q1/f1456.parqu= et], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2000/Q1/f1745.parquet], Read= EntryWithPath [path=3D/tmp/DRILL_4589/2005/Q3/f573.parquet], ReadEntryWithP= ath [path=3D/tmp/DRILL_4589/2013/Q4/f855.parquet], ReadEntryWithPath [path= =3D/tmp/DRILL_4589/2001/Q2/f1424.parquet], ReadEntryWithPath [path=3D/tmp/D= RILL_4589/1996/Q4/f61.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/1= 990/Q1/f606.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2010/Q4/f15= 75.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2001/Q4/f1116.parque= t], ReadEntryWithPath [path=3D/tmp/DRILL_4589/1991/Q3/f1596.parquet], ReadE= ntryWithPath [path=3D/tmp/DRILL_4589/2004/Q1/f1479.parquet], ReadEntryWithP= ath [path=3D/tmp/DRILL_4589/2002/Q1/f1411.parquet], ReadEntryWithPath [path= =3D/tmp/DRILL_4589/2006/Q1/f591.parquet], ReadEntryWithPath [path=3D/tmp/DR= ILL_4589/1996/Q2/f20.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/19= 93/Q4/f379.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2014/Q2/f873= .parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2000/Q2/f84.parquet], = ReadEntryWithPath [path=3D/tmp/DRILL_4589/2001/Q1/f57.parquet], ReadEntryWi= thPath [path=3D/tmp/DRILL_4589/2000/Q1/f361.parquet], ReadEntryWithPath [pa= th=3D/tmp/DRILL_4589/2000/Q3/f1148.parquet], ReadEntryWithPath [path=3D/tmp= /DRILL_4589/2013/Q1/f206.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_458= 9/1995/Q1/f489.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/1992/Q4/= f1564.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2005/Q2/f1447.par= quet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/1999/Q4/f280.parquet], Rea= dEntryWithPath [path=3D/tmp/DRILL_4589/2010/Q4/f327.parquet], ReadEntryWith= Path [path=3D/tmp/DRILL_4589/1990/Q3/f1207.parquet], ReadEntryWithPath [pat= h=3D/tmp/DRILL_4589/2003/Q4/f85.parquet], ReadEntryWithPath [path=3D/tmp/DR= ILL_4589/1994/Q4/f799.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2= 008/Q4/f423.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2003/Q2/f71= 5.parquet], ReadEntryWithPath [path=3D/tmp/DRILL_4589/2004/Q3/f58.parquet],= ReadEntryWithPath [path=3D/tmp/DRILL_4589/2011/Q2/f996.parquet], ReadEntry= WithPath [path=3D/tmp/DRILL_4589/2012/Q1/f437.parquet], ReadEntryWithPath [= path=3D/tmp/DRILL_4589/2011/Q2/f520.parquet], ReadEntryWithPath [path=3D/tm= p/DRILL_4589/1993/Q3/f300.parquet], ReadEntryWithP | > +------+------+ > 1 row selected (26.19 seconds) > {noformat} > Details of directory structure and size in bytes > There are 26 sub-directories with each sub directory having four more sub= -directories Q1,Q2,Q3 and Q4 > Each such directory having 2000 small parquet files. > {noformat} > [root@centos-01 ~]# hadoop fs -du /tmp/DRILL_4589 > 178702459 /tmp/DRILL_4589/.drill.parquet_metadata > 112420427 /tmp/DRILL_4589/1990 > 112433621 /tmp/DRILL_4589/1991 > 112433621 /tmp/DRILL_4589/1992 > 112433621 /tmp/DRILL_4589/1993 > 112433621 /tmp/DRILL_4589/1994 > 112433621 /tmp/DRILL_4589/1995 > 112433621 /tmp/DRILL_4589/1996 > 112433621 /tmp/DRILL_4589/1997 > 112433621 /tmp/DRILL_4589/1998 > 112433621 /tmp/DRILL_4589/1999 > 112433621 /tmp/DRILL_4589/2000 > 112433621 /tmp/DRILL_4589/2001 > 112433621 /tmp/DRILL_4589/2002 > 112433621 /tmp/DRILL_4589/2003 > 112433621 /tmp/DRILL_4589/2004 > 112433621 /tmp/DRILL_4589/2005 > 112433621 /tmp/DRILL_4589/2006 > 112433621 /tmp/DRILL_4589/2007 > 112433621 /tmp/DRILL_4589/2008 > 112433621 /tmp/DRILL_4589/2009 > 112433621 /tmp/DRILL_4589/2010 > 112433621 /tmp/DRILL_4589/2011 > 112433621 /tmp/DRILL_4589/2012 > 112433621 /tmp/DRILL_4589/2013 > 112433621 /tmp/DRILL_4589/2014 > 112433621 /tmp/DRILL_4589/2015 > total size in bytes of DRILL_4589 directory =3D 3101633561 =3D> 3.1GB > {noformat} > Test result from 1.7.0-SNAPSHOT git commit ID: f7197596 > on 4 node Centos cluster > {noformat} > 0: jdbc:drill:schema=3Ddfs.tmp> select max(dir0), max(dir1), max(dir2) fr= om `DRILL_4589`; > +---------+---------+---------+ > | EXPR$0 | EXPR$1 | EXPR$2 | > +---------+---------+---------+ > | 2015 | Q4 | null | > +---------+---------+---------+ > 1 row selected (38.05 seconds) > {noformat} > Query plan for above query from Drill 1.7.0 note that usedMetadataFile=3D= true in the query plan > {noformat} > 0: jdbc:drill:schema=3Ddfs.tmp> explain plan for select max(dir0), max(di= r1), max(dir2) from `DRILL_4589`; > +------+------+ > | text | json | > +------+------+ > | 00-00 Screen > 00-01 Project(EXPR$0=3D[$0], EXPR$1=3D[$1], EXPR$2=3D[$2]) > 00-02 StreamAgg(group=3D[{}], EXPR$0=3D[MAX($0)], EXPR$1=3D[MAX($1= )], EXPR$2=3D[MAX($2)]) > 00-03 UnionExchange > 01-01 StreamAgg(group=3D[{}], EXPR$0=3D[MAX($0)], EXPR$1=3D[MA= X($1)], EXPR$2=3D[MAX($2)]) > 01-02 Scan(groupscan=3D[ParquetGroupScan [entries=3D[ReadEnt= ryWithPath [path=3Dmaprfs:///tmp/DRILL_4589]], selectionRoot=3D/tmp/DRILL_4= 589, numFiles=3D1, usedMetadataFile=3Dtrue, columns=3D[`dir0`, `dir1`, `dir= 2`]]]) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)