drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kunal Khatua (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-5715) Performance of refactored HashAgg operator regressed
Date Thu, 10 Aug 2017 21:16:00 GMT

     [ https://issues.apache.org/jira/browse/DRILL-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kunal Khatua updated DRILL-5715:
--------------------------------
    Description: 
When running the following simple HashAgg-based query on a TPCH-table - Lineitem with 6Billion
rows on a 10 node setup (with a single partition to disable any possible spilling to disk)

{code:sql}
select count(*) 
from (
  select l_quantity
    , count(l_orderkey) 
  from lineitem 
  group by l_quantity 
)  {code}

the runtime increased from {{7.378 sec}} to {{11.323 sec}} [reported by the JDBC client].

To disable spill-to-disk in Drill-1.11.0, the {{drill-override.conf}} was modified to 
{code}drill.exec.hashagg.num_partitions : 1{code}

Attached are two profiles
Drill 1.10.0 : [^2675cc73-9481-16e0-7d21-5f1338611e5f.sys.drill] 
Drill 1.11.0 : [^2675de42-3789-47b8-29e8-c5077af136db.sys.drill]

A separate run was done for both scenarios with the {{planner.width.max_per_node=10}} and
profiled with YourKit.

Image snippets are attached, indicating the hotspots in both builds:
Drill 1.10.0 : 
 Profile: [^26736242-d084-6604-aac9-927e729da755.sys.drill]
 CallTree: [^drill-1.10.0_callTree.png]
 HotSpot: [^drill-1.10.0_hotspot.png]
Drill 1.11.0 : 
 Profile: [^26736615-9e86-dac9-ad77-b022fd791f67.sys.drill]
 CallTree: [^drill-1.11.0_callTree.png]
 HotSpot: [^drill-1.11.0_hotspot.png] 


  was:
When running the following simple HashAgg-based query on a TPCH-table - Lineitem with 6Billion
rows on a 10 node setup (with a single partition to disable any possible spilling to disk)

{code:sql}
select count(*) 
from (
  select l_quantity
    , count(l_orderkey) 
  from lineitem 
  group by l_quantity 
)  {code}

the runtime increased from {{7.378 sec}} to {{11.323 sec}} [reported by the JDBC client].

To disable spill-to-disk in Drill-1.11.0, the {{drill-override.conf}} was modified to 
{code}drill.exec.hashagg.num_partitions : 1{code}

Attached are two profiles
Drill 1.10.0 : [^2675cc73-9481-16e0-7d21-5f1338611e5f.sys.drill] 
Drill 1.11.0 : [^2675de42-3789-47b8-29e8-c5077af136db.sys.drill]

A separate run was done for both scenarios with the {{planner.width.max_per_node=10}} and
profiled with YourKit.

Image snippets are attached, indicating the hotspots in both builds:
Drill 1.10.0 : 
 Profile: [^26736242-d084-6604-aac9-927e729da755.sys.drill]
 HotSpot: drill-1.10.0_hotspot.jpg
Drill 1.11.0 : 
 Profile: [^26736615-9e86-dac9-ad77-b022fd791f67.sys.drill]
 HotSpot: [^drill-1.11.0_hotspot.jpg] 



> Performance of refactored HashAgg operator regressed
> ----------------------------------------------------
>
>                 Key: DRILL-5715
>                 URL: https://issues.apache.org/jira/browse/DRILL-5715
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Codegen
>    Affects Versions: 1.11.0
>         Environment: 10-node RHEL 6.4 (32 Core, 256GB RAM)
>            Reporter: Kunal Khatua
>            Assignee: Boaz Ben-Zvi
>              Labels: performance, regression
>             Fix For: 1.12.0
>
>         Attachments: 26736242-d084-6604-aac9-927e729da755.sys.drill, 26736615-9e86-dac9-ad77-b022fd791f67.sys.drill,
2675cc73-9481-16e0-7d21-5f1338611e5f.sys.drill, 2675de42-3789-47b8-29e8-c5077af136db.sys.drill,
drill-1.10.0_callTree.png, drill-1.10.0_hotspot.png, drill-1.11.0_callTree.png, drill-1.11.0_hotspot.png
>
>
> When running the following simple HashAgg-based query on a TPCH-table - Lineitem with
6Billion rows on a 10 node setup (with a single partition to disable any possible spilling
to disk)
> {code:sql}
> select count(*) 
> from (
>   select l_quantity
>     , count(l_orderkey) 
>   from lineitem 
>   group by l_quantity 
> )  {code}
> the runtime increased from {{7.378 sec}} to {{11.323 sec}} [reported by the JDBC client].
> To disable spill-to-disk in Drill-1.11.0, the {{drill-override.conf}} was modified to

> {code}drill.exec.hashagg.num_partitions : 1{code}
> Attached are two profiles
> Drill 1.10.0 : [^2675cc73-9481-16e0-7d21-5f1338611e5f.sys.drill] 
> Drill 1.11.0 : [^2675de42-3789-47b8-29e8-c5077af136db.sys.drill]
> A separate run was done for both scenarios with the {{planner.width.max_per_node=10}}
and profiled with YourKit.
> Image snippets are attached, indicating the hotspots in both builds:
> Drill 1.10.0 : 
>  Profile: [^26736242-d084-6604-aac9-927e729da755.sys.drill]
>  CallTree: [^drill-1.10.0_callTree.png]
>  HotSpot: [^drill-1.10.0_hotspot.png]
> Drill 1.11.0 : 
>  Profile: [^26736615-9e86-dac9-ad77-b022fd791f67.sys.drill]
>  CallTree: [^drill-1.11.0_callTree.png]
>  HotSpot: [^drill-1.11.0_hotspot.png] 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message