Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm
Precedence: bulk
Message-Id: <201705302317.v4UNHw6d000950@ip-10-146-233-104.ec2.internal>
Date: Tue, 30 May 2017 23:17:58 +0000
From: "Tim Armstrong (Code Review)" <gerrit@cloudera.org>
To: impala-cr@cloudera.com, reviews@impala.incubator.apache.org
CC: Dan Hecht <dhecht@cloudera.com>
Reply-To: tarmstrong@cloudera.com
Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-5160=3A_adjust_spill_buffer_size_based_on_planner_estimates=0A?=
In-Reply-To: <gerrit.1495571517000.I57b5b4c528325d478c8a9b834a6bc5dedab54b5b@gerrit.cloudera.org>
References: <gerrit.1495571517000.I57b5b4c528325d478c8a9b834a6bc5dedab54b5b@gerrit.cloudera.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Content-Disposition: inline
User-Agent: Gerrit/2.12.7
archived-at: Tue, 30 May 2017 23:18:02 -0000

Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5160: adjust spill buffer size based on planner estimates
......................................................................


Patch Set 6:

The main scenario where it could lead to additional spilling (assuming mem_limit is fixed) is if the scans grabbed memory that the spillable operators would otherwise have been able to use. Otherwise it could result in different operators spilling, which could have positive or negative impacts on perf. If we get it right we'll reduce spilling because more memory can be devoted to the operators that need it.

I ran a few TPC-H queries to compare estimated memory with peak memory. E.g. TPC-H Q9 is below. The estimates are off by quite a bit in some cases, but the error comes mainly from the non-linear memory consumption of the 64kb/512kb/8mb buffer size ramp-up in the old code, rather than cardinality estimation errors.

    Operator              #Hosts   Avg Time   Max Time    #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail                  
    ---------------------------------------------------------------------------------------------------------------------------
    21:MERGING-EXCHANGE        1  149.369us  149.369us      175      61.70K          0              0  UNPARTITIONED           
    12:SORT                    3  410.974us  547.810us      175      61.70K   24.02 MB       16.00 MB                          
    20:AGGREGATE               3    1.261ms    1.306ms      175      61.70K    2.30 MB       10.00 MB  FINALIZE                
    19:EXCHANGE                3   68.482us   81.851us      525      61.70K          0              0  HASH(nation,o_year)     
    11:AGGREGATE               3  201.760ms  217.919ms      525      61.70K    2.11 MB       10.00 MB  STREAMING               
    10:HASH JOIN               3   26.906ms   37.907ms  319.40K     574.29K    1.06 MB       690.00 B  INNER JOIN, BROADCAST   
    |--18:EXCHANGE             3   17.306us   17.720us       25          25          0              0  BROADCAST               
    |  05:SCAN HDFS            1   17.458ms   17.458ms       25          25   43.00 KB       32.00 MB  tpch.nation             
    09:HASH JOIN               3  114.307ms  155.488ms  319.40K     574.29K  169.40 MB       20.14 MB  INNER JOIN, BROADCAST   
    |--17:EXCHANGE             3  284.583ms  296.048ms  800.00K     800.00K          0              0  BROADCAST               
    |  03:SCAN HDFS            1  373.590ms  373.590ms  800.00K     800.00K   32.46 MB      176.00 MB  tpch.partsupp           
    08:HASH JOIN               3   39.011ms   43.315ms  319.40K     574.29K    1.52 MB      107.42 KB  INNER JOIN, BROADCAST   
    |--16:EXCHANGE             3    2.050ms    2.485ms   10.00K      10.00K          0              0  BROADCAST               
    |  01:SCAN HDFS            1  775.848ms  775.848ms   10.00K      10.00K    2.04 MB       32.00 MB  tpch.supplier           
    07:HASH JOIN               3  722.690ms  736.289ms  319.40K     574.29K  153.17 MB       17.83 MB  INNER JOIN, PARTITIONED 
    |--15:EXCHANGE             3  187.995ms  194.122ms    1.50M       1.50M          0              0  HASH(o_orderkey)        
    |  04:SCAN HDFS            2  318.553ms  512.326ms    1.50M       1.50M   42.05 MB      176.00 MB  tpch.orders             
    14:EXCHANGE                3   41.321ms   62.755ms  319.40K     598.58K          0              0  HASH(l_orderkey)        
    06:HASH JOIN               3  256.056ms  273.355ms  319.40K     598.58K    1.47 MB        1.19 MB  INNER JOIN, BROADCAST   
    |--13:EXCHANGE             3    1.733ms    1.972ms   10.66K      20.00K          0              0  BROADCAST               
    |  00:SCAN HDFS            1  821.018ms  821.018ms   10.66K      20.00K   32.04 MB       64.00 MB  tpch.part               
    02:SCAN HDFS               3    2s611ms    2s690ms    6.00M       6.00M   64.23 MB      264.00 MB  tpch.lineitem           


It sounds like we would need to test this with the new code - using the old code won't give any insights because it will be dominated by behaviour that doesn't carry over. Interestingly we might actually win back a fair bit of memory and therefore reduce spilling by avoiding the big jump in memory consumption of 8MB buffers.

I think it might be best, if possible, to agree on the planner mechanism first and then tune the policy based on experiments with a more final version of query execution. I could do experiments with the current prototype of query execution but it feels a bit speculative until more pieces are in place.

-- 
To view, visit http://gerrit.cloudera.org:8080/6963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I57b5b4c528325d478c8a9b834a6bc5dedab54b5b
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: No