impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (JIRA)" <>
Subject [jira] [Created] (IMPALA-5347) Parquet scanner has a lot of small CPU inefficiencies
Date Mon, 22 May 2017 15:01:04 GMT
Tim Armstrong created IMPALA-5347:

             Summary: Parquet scanner has a lot of small CPU inefficiencies
                 Key: IMPALA-5347
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
    Affects Versions: Impala 2.9.0
            Reporter: Tim Armstrong
            Assignee: Tim Armstrong
            Priority: Minor

I spent some time looking at the parquet scanner in perf top. There are a lot of cases where
the code is inefficient in ways that are easily fixed. Together this could add up to a significant
perf win for scans.

The assembly of the core MaterializeValueBatch() loop has a lot of obvious inefficiency:
* Many loads from memory of values that are constant within the loop
* The generated bit unpacking and dictionary decoding code has a lot of inefficiency, e.g.
a complicated bounds check
* Hot functions like DictDecoder::Get() are not inlined.

A lot of time is also spent on some scans calling memset() on one or two bytes inside InitTuple().

This message was sent by Atlassian JIRA

View raw message