drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben-Zvi <...@git.apache.org>
Subject [GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...
Date Sat, 09 Sep 2017 02:17:24 GMT
GitHub user Ben-Zvi opened a pull request:


    DRILL-5694: Handle HashAgg OOM by spill and retry, plus perf improvement

      The main change in this PR is adding a "_second way_" to handle memory pressure for
the Hash Aggregate: Basically catch OOM failures when processing a new input row (during put()
into the Hash Table), cleanup internally to allow a retry (of the put()) and return a new
exception "**RetryAfterSpillException**". In such a case the caller spills some partition
to free more memory, and retries inserting that new row.
       In addition, to reduce the risk of OOM when either creating the "Values Batch" (to
match the "Keys Batch" in the Hash Table), or when allocating the Outgoing vectors (for the
Values) -- there are new "_reserves_" -- one reserve for each of the two. A "_reserve_" is
a memory amount subtracted from the memory-limit, which is added back to the limit just before
it is needed, so hopefully preventing an OOM. After the allocation the code tries to restore
that reserve (by subtracting from the limit, if possible). We always restore the "Outgoing
Reserve" first; in case the "Values Batch" reserve runs empty just before calling put(), we
skip the put() (just like an OOM there) and spill to free some memory (and restore that reserve).
       The old "_first way_" is still used. That is the code that predicts the memory needs,
and triggers a spill if not enough memory is available. The spill code was separated into
a new method called spillIfNeeded() which is used in two modes - either the old way (prediction),
or (when called from the new OOM catch code) with a flag to force a spill, regardless of available
memory. That flag is also used to reduce the priority of the "current partition" when choosing
a partition to spill.
      A new testing option was added (**hashagg_use_memory_prediction**, default true) - by
setting this to false the old "first way" is disabled. This allows stress testing of the OOM
handling code (which may not be used under normal memory allocation).
      The HashTable put() code was re-written to cleanup partial changes in case of an OOM.
And so the code around the call of put() to catch the new exception, spill and retry. Note
that this works for 1st phase aggregation as well (return rows early).
    For the estimates (in addition to the old "max batch size" estimate) - there is an estimate
for the Values Batch, and one for for the Outgoing. These are used for restoring the "reserves".
These estimates may be resized up in case actual allocations are bigger.
    Other changes:
    * Improved the "max batch size estimation" -- using the outgoing batch for getting the
correct schema (instead of the input batch).
      The only information needed from the input batch is the "max average column size" (see
change inRecordBatchSizer.java) to have a better estimate for VarChars.
      Also computed the size of those "no null" bigint columns added into the Values Batch
when the aggregation is SUM, MIN or MAX (see changes in HashAggBatch.java and HashAggregator.java)
    * Using a "plain Java" subclass for the HashTable  because "byte manipulation" breaks
on the new template code (see ChainedHashTable.java)
    * The three Configuration options where changed into System/Session options:   min_batches_per_partition
, hashagg_max_memory , hashagg_num_partitions
    * There was a potential memory leak in the HashTable BatchHolder ctor (vectors were added
to the container only after the successful allocation, and the container was cleared in case
of OOM. So in case of a partial allocation, the allocated part was no accessible). Also (Paul's
suggestion) modified some vector templates to cleanup after any runtime error (including an
    * Performance improvements: Eliminated the call to updateBatches() before each hash computation
(instead used only when switching to a new SpilledRecordBatch); this was a big overhead.
       Also changed all the "setSafe" calls into "set" for the HashTable (those nanoseconds
add up, specially when rehashing) - these bigint vectors need no resizing.
    * Ignore "(spill) file not found" error while cleaning up.
    * The unit tests were re-written in a more compact form. And a test with the new option
(forcing the OOM code) was added (no memory prediction).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Ben-Zvi/drill DRILL-5694

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #938
commit 1a96bb39faf01b7665bd669d88494789693ed9b8
Author: Ben-Zvi <bben-zvi@mapr.com>
Date:   2017-09-08T22:52:57Z

    DRILL-5694: Handle OOM in HashAggr by spill and retry, plus performance improvement



View raw message