impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (IMPALA-4858) Provide better explanation for obscure Memory limit exceeded failures
Date Wed, 15 Mar 2017 18:14:42 GMT

     [ https://issues.apache.org/jira/browse/IMPALA-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tim Armstrong updated IMPALA-4858:
----------------------------------
    Priority: Critical  (was: Major)

> Provide better explanation for obscure Memory limit exceeded failures
> ---------------------------------------------------------------------
>
>                 Key: IMPALA-4858
>                 URL: https://issues.apache.org/jira/browse/IMPALA-4858
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.9.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Tim Armstrong
>            Priority: Critical
>              Labels: supportability
>
> Today I found two memory limit exceeded failures which are hard to root cause, it is
not clear what was consuming the memory at that time and how much memory the query was requesting,
how much was available etc.. 
> {code}
> ExecPlanRequest rpc query_id=9c43ca499b1c64c1:a6a91d3900000000 instance_id=9c43ca499b1c64c1:a6a91d3900000068
failed: 
> Memory limit exceeded
> Query 9c43ca499b1c64c1:a6a91d3900000000 could not start because the backend Impala daemon
is over its memory limit
> {code}
> It is not clear which Impala daemon didn't have enough memory.
> And this is the full profile for the failed query
> {code}
> ----------------
> Estimated Per-Host Requirements: Memory=10.00MB VCores=1
> WARNING: The following tables are missing relevant table and/or column statistics.
> tpcds_100000_parquet.store_returns
> PLAN-ROOT SINK
> |
> 03:AGGREGATE [FINALIZE]
> |  output: count:merge(*)
> |  hosts=133 per-host-mem=unavailable
> |  tuple-ids=1 row-size=8B cardinality=1
> |
> 02:EXCHANGE [UNPARTITIONED]
> |  hosts=133 per-host-mem=unavailable
> |  tuple-ids=1 row-size=8B cardinality=1
> |
> 01:AGGREGATE
> |  output: count(*)
> |  hosts=133 per-host-mem=10.00MB
> |  tuple-ids=1 row-size=8B cardinality=1
> |
> 00:SCAN HDFS [tpcds_100000_parquet.store_returns, RANDOM]
>    partitions=2004/2004 files=7087 size=1.50TB
>    table stats: unavailable
>    column stats: all
>    hosts=133 per-host-mem=0B
>    tuple-ids=0 row-size=0B cardinality=unavailable
> ----------------
>     Estimated Per-Host Mem: 10485760
>     Estimated Per-Host VCores: 1
>     Tables Missing Stats: tpcds_100000_parquet.store_returns
>     Request Pool: root.systest
>     Admission result: Admitted immediately
>     ExecSummary: 
> Operator       #Hosts  Avg Time  Max Time  #Rows  Est. #Rows  Peak Mem  Est. Peak Mem
 Detail                         
> ----------------------------------------------------------------------------------------------------------------------
> 03:AGGREGATE        1   0.000ns   0.000ns      0           1         0        -1.00 B
 FINALIZE                       
> 02:EXCHANGE         1   0.000ns   0.000ns      0           1         0        -1.00 B
 UNPARTITIONED                  
> 01:AGGREGATE      133   0.000ns   0.000ns      0           1         0       10.00 MB
                                
> 00:SCAN HDFS      133   0.000ns   0.000ns      0          -1         0              0
 tpcds_100000_parquet.store_... 
>     Planner Timeline: 9.968ms
>        - Analysis finished: 1.704ms (1.704ms)
>        - Equivalence classes computed: 1.806ms (101.124us)
>        - Single node plan created: 7.763ms (5.957ms)
>        - Runtime filters computed: 7.794ms (30.470us)
>        - Distributed plan created: 7.958ms (164.549us)
>        - Lineage info computed: 8.001ms (42.703us)
>        - Planning finished: 9.968ms (1.967ms)
>     Query Timeline: 1s238ms
>        - Query submitted: 415.243us (415.243us)
>        - Planning finished: 122.247ms (121.832ms)
>        - Submit for admission: 148.126ms (25.878ms)
>        - Completed admission: 149.066ms (940.329us)
>        - Ready to start 134 fragment instances: 158.642ms (9.576ms)
>        - All 134 fragment instances started: 873.469ms (714.826ms)
>        - Unregister query: 1s235ms (362.225ms)
>      - ComputeScanRangeAssignmentTimer: 12.877ms
>   ImpalaServer:
>      - ClientFetchWaitTimer: 0.000ns
>      - RowMaterializationTimer: 0.000ns
> {code}
> The second is
> {code}
>  AnalysisException: Failed to evaluate expr: 100
> CAUSED BY: InternalException: Memory limit exceeded
> {code}
> And this is the full profile
> {code}
>  Impala Version: impalad version 2.9.0-SNAPSHOT RELEASE (build e0835699a2a2f2ce288ca306b3d7b020dc93a1e3)
>     User: systest@foo
>     Connected User: systest@foo
>     Delegated User: 
>     Network Address: ::ffff:10.17.187.36:55556
>     Default Db: tpcds_30000_parquet
>     Sql Statement: with customer_total_return as
>  (select wr_returning_customer_sk as ctr_customer_sk
>         ,ca_state as ctr_state, 
>  	sum(wr_return_amt) as ctr_total_return
>  from web_returns
>      ,date_dim
>      ,customer_address
>  where wr_returned_date_sk = d_date_sk 
>    and d_year =2002
>    and wr_returning_addr_sk = ca_address_sk 
>  group by wr_returning_customer_sk
>          ,ca_state)
>   select  c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag
>        ,c_birth_day,c_birth_month,c_birth_year,c_birth_country,c_login,c_email_address
>        ,c_last_review_date,ctr_total_return
>  from customer_total_return ctr1
>      ,customer_address
>      ,customer
>  where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2
>  			  from customer_total_return ctr2 
>                   	  where ctr1.ctr_state = ctr2.ctr_state)
>        and ca_address_sk = c_current_addr_sk
>        and ca_state = 'VT'
>        and ctr1.ctr_customer_sk = c_customer_sk
>  order by c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag
>                   ,c_birth_day,c_birth_month,c_birth_year,c_birth_country,c_login,c_email_address
>                   ,c_last_review_date,ctr_total_return
> limit 100
>     Coordinator: va1026.foo:22000
>     Query Timeline: 34.362ms
>        - Query submitted: 24.311us (24.311us)
>        - Unregister query: 33.617ms (33.592ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message