drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-6087) Aggregates that use ObjectHolder will fail when Hash Agg spills
Date Sun, 14 Jan 2018 06:09:00 GMT
Paul Rogers created DRILL-6087:

             Summary: Aggregates that use ObjectHolder will fail when Hash Agg spills
                 Key: DRILL-6087
                 URL: https://issues.apache.org/jira/browse/DRILL-6087
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.12.0
            Reporter: Paul Rogers

Drill has this thing called an “ObjectVector” which is vector that holds onto Java objects.
We use it for things like the system tables.

The ObjectVector has something called an ObjectHolder. For various reasons (see [this Wiki
writeup|https://github.com/paul-rogers/drill/wiki/Aggregate-UDFs], some Drill aggregates used
this holder to create aggregates that need more than a few numbers as working values.

As it turns out, all the Decimal AVG functions use the ObjectHolder to hold the intermediate
values. (Also true of Decimal Max, Min and Sum. Also true of Max and Min for VarBytes. Just
do a code search for uses of ObjectHolder.)

In the old pre-spill days, things worked fine. But, with Hash Agg spilling, we need to write
intermediate values out to disk, then read them back.

But, the object vector never implemented the methods needed for spilling! Instead, it will
throw an UnsupportedOperationException.

What does this mean?

If you run a query, using the aggregate functions above, use the Hash Agg, and have enough
data to cause spilling, your query will fail. Do the same query with Streaming Agg, and it
will work. Reduce data to avoid spilling and the query will work.

This message was sent by Atlassian JIRA

View raw message