impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (JIRA)" <j...@apache.org>
Subject [jira] [Created] (IMPALA-5347) Parquet scanner has a lot of small CPU inefficiencies
Date Mon, 22 May 2017 15:01:04 GMT
Tim Armstrong created IMPALA-5347:
-------------------------------------

             Summary: Parquet scanner has a lot of small CPU inefficiencies
                 Key: IMPALA-5347
                 URL: https://issues.apache.org/jira/browse/IMPALA-5347
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
    Affects Versions: Impala 2.9.0
            Reporter: Tim Armstrong
            Assignee: Tim Armstrong
            Priority: Minor


I spent some time looking at the parquet scanner in perf top. There are a lot of cases where
the code is inefficient in ways that are easily fixed. Together this could add up to a significant
perf win for scans.

The assembly of the core MaterializeValueBatch() loop has a lot of obvious inefficiency:
* Many loads from memory of values that are constant within the loop
* The generated bit unpacking and dictionary decoding code has a lot of inefficiency, e.g.
a complicated bounds check
* Hot functions like DictDecoder::Get() are not inlined.

A lot of time is also spent on some scans calling memset() on one or two bytes inside InitTuple().



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message