impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taras Bobrovytsky (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (IMPALA-4787) Optimize APPX_MEDIAN() mem usage in case of many grouping keys
Date Tue, 21 Mar 2017 00:45:42 GMT

    [ https://issues.apache.org/jira/browse/IMPALA-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15929211#comment-15929211
] 

Taras Bobrovytsky edited comment on IMPALA-4787 at 3/21/17 12:44 AM:
---------------------------------------------------------------------

{code}
commit 529a5f99b959079faead34a977fba1125d01840e
Author: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Date:   Mon Feb 13 18:14:56 2017 -0800

    IMPALA-4787: Optimize APPX_MEDIAN() memory usage
    
    Before this change, ReservoirSample functions (such as APPX_MEDIAN())
    allocated memory for 20,000 elements up front per grouping key. This
    caused inefficient memory usage for aggregations with many grouping
    keys.
    
    This patch fixes this by initially allocating memory for 16 elements.
    Once the buffer becomes full, we reallocate a new buffer with double
    capacity and copy the original buffer into the new one. We continue
    doubling the buffer size until the buffer has room for 20,000 elements
    as before.
    
    Testing:
    Added some EE APPX_MEDIAN() tests on larger datasets that exercise the
    resize code path.
{code}


was (Author: tarasbob):
{code}
commit 1f4c37ab7e7e0bcc832e94f38fcf0a24970ae3c2
Author: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Date:   Wed Jan 4 14:33:08 2017 -0800

    IMPALA-3586: Implement union passthrough
    
    The union node acts as pass through operator and forwards row batches
    from it's children without materializing. This is done in the case
    when the child's tuple layout is identical to union node tuple layout
commit 1f4c37ab7e7e0bcc832e94f38fcf0a24970ae3c2
Author: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Date:   Wed Jan 4 14:33:08 2017 -0800

    IMPALA-3586: Implement union passthrough
    
    The union node acts as pass through operator and forwards row batches
    from it's children without materializing. This is done in the case
    when the child's tuple layout is identical to union node tuple layout
    and no functions need to be applied to the child row batches.
    
    Removed operand reordering in the FE because it's simpler and safer to
    handle all passthrough children before non-passthrough children in the
    BE. The recent improvements to memory management allowed us to remove
    this requirement.
    
    A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
    as a precaution and for testing purposes.
    
    Testing:
    - Added new planner and end to end tests that cover the new
      functionality.
    - Updated existing tests to reflect the new behavior.
{code}

> Optimize APPX_MEDIAN() mem usage in case of many grouping keys
> --------------------------------------------------------------
>
>                 Key: IMPALA-4787
>                 URL: https://issues.apache.org/jira/browse/IMPALA-4787
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 2.8.0
>            Reporter: Marcell Szabo
>            Assignee: Taras Bobrovytsky
>            Priority: Critical
>              Labels: usability
>             Fix For: Impala 2.9.0
>
>
> APPX_MEDIAN uses a lot of memory per grouping key. It allocates space for 20,000 samples
per grouping key to estimate the median. The current implementation targeted towards non-grouping
aggregations or aggregations with relatively few distinct grouping keys.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message