impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (JIRA)" <>
Subject [jira] [Resolved] (IMPALA-5629) list::size() in BufferedTupleStreamV2::AdvanceWritePage() is expensive
Date Tue, 11 Jul 2017 03:24:00 GMT


Tim Armstrong resolved IMPALA-5629.
       Resolution: Fixed
    Fix Version/s: Impala 2.10.0

IMPALA-5629: avoid expensive list::size() call

As a workaround until we move to GCC5+, explicitly track the pages_
list size. This is not too bad in practice since it is only mutated
in 3 places.

Ran buffered-tuple-stream-v2-test (the only coverage of
BufferedTupleStreamV2 currently).

Reran the query with the perf issue, confirmed that it was no longer
spending lots of time in BufferedTupleStreamV2::AdvanceWritePage().

Change-Id: Id83fcf68dcc3ea729df167885f999ff32b861e66
Reviewed-by: Dan Hecht <>
Tested-by: Impala Public Jenkins

> list::size() in BufferedTupleStreamV2::AdvanceWritePage() is expensive
> ----------------------------------------------------------------------
>                 Key: IMPALA-5629
>                 URL:
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>    Affects Versions: Impala 2.10.0
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>              Labels: perf
>             Fix For: Impala 2.10.0
> In a test run executing a very large join I saw a lot of CPU being burnt in BufferedTupleStreamV2::AdvanceWritePage()

> It looks like it's all being spent iterating over the pages_ linked list. list::size()
is an O(n) operation in some implementations.

This message was sent by Atlassian JIRA

View raw message