drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5779) HashAgg template is far too large, cause performance hit
Date Sun, 10 Sep 2017 02:37:00 GMT
Paul Rogers created DRILL-5779:

             Summary: HashAgg template is far too large, cause performance hit
                 Key: DRILL-5779
                 URL: https://issues.apache.org/jira/browse/DRILL-5779
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.11.0
            Reporter: Paul Rogers

Drill uses code generation to produce query-specific code to copy values, perform calculations,
and so on. Drill does this by generating code based on templates. Drill, internally, copies
the template byte codes and merges them with generated by byte codes. (Drill does not use
Java subclassing for generated code.)

The Hash Agg batch places thousands of lines of boilerplate code into the template. This forces
Drill to:

1. Copy those byte codes *for every query*.
2. The "byte code fixup" logic to walk the byte code tree for the template *for every query.*
3. The code cache to cache a separate copy of the template *for every query*.

There is a clear performance cost from doing the copying and tree walking. There is a memory
cost to buffering multiple copies of the same code. It is not clear that we have any data
that says that doing this work provides benefits to the Drill user in terms of better stability,
greater performance or more features.

We should consider moving the bulk of the code out of the template to avoid the overheads
cited above. The result may be better performance and reduced memory pressure.

This message was sent by Atlassian JIRA

View raw message