druid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Schiff <michaelsch...@apache.org>
Subject Re: ItemsSketch Aggregator in druid-datasketches extension
Date Fri, 23 Jul 2021 21:13:44 GMT
Thanks for the detailed answer Gian!  Sounds like the best approach then is to pursue a ByteBuffer
based implementation of ItemsSketch, and then use this for an implementation of ItemSketchBuildBufferAggregator(Helper)
that does this same reference update if the sketch has to grow.

On 2021/07/23 20:35:52, Gian Merlino <gian@apache.org> wrote: 
> Hey Michael,
> 
> Very cool!
> 
> To answer your question: it is critical to have a BufferAggregator. Some
> context; there are 3 kinds of aggregators:
> 
> - Aggregator: stores intermediate state on heap; is used during ingestion
> and by the non-vectorized timeseries query engine. Required, or else some
> queries won't work properly.
> - BufferAggregator: stores intermediate state off-heap; is used by the
> non-vectorized topN and groupBy engines. Required, or else some queries
> won't work properly.
> - VectorAggregator: stores intermediate state off-heap; is used by
> vectorized query engines. Optional, but if there are any aggregators in a
> query that don't have a VectorAggregator implementation, then the whole
> query will run non-vectorized, which will slow it down.
> 
> The main reason that BufferAggregators exist is to support off-heap
> aggregation, which minimizes GC pressure and prevents out-of-memory errors
> during the aggregation process.
> 
> > Assuming it is, we can begin talking with the datasketches team about the
> possibility of a Direct implementation.
> 
> With ItemsSketch, the biggest roadblock you're likely to run into is the
> fact that the items may be of variable size. Currently in Druid each
> BufferAggregator (and VectorAggregator) must have a fixed-size amount of
> space allocated off heap. The way this is done is by returning the number
> of bytes you need from AggregatorFactory.getMaxIntermediateSize. It's all
> allocated upfront. It can depend on aggregator configuration (for example,
> the HLL aggregators need more space if they're configured to have higher
> accuracy) but it can't depend on the data that is encountered during the
> query. So if you do have variable size items, you get into quite a
> conundrum!
> 
> In order to have a "proper" implementation of an off-heap variable-sized
> sketch, we'd need an aggregator API that allows them to allocate additional
> memory at runtime. The DataSketches folks have asked us for this before but
> we haven't built it yet. There's a couple ways you could solve it in the
> meantime:
> 
> 1) Allocate a fixed-size amount of memory that is enough to store most data
> you reasonably expect to encounter, update that off-heap, and allocate
> additional memory on heap if needed, without Druid "knowing" about it. This
> is something that'd require a Direct version of the ItemsSketch. It's a
> technique used by the quantiles sketches too. It's a hack but it works. You
> can see it in action by looking at DirectUpdateDoublesSketch
> <https://github.com/apache/datasketches-java/blob/master/src/main/java/org/apache/datasketches/quantiles/DirectUpdateDoublesSketch.java>
> in DataSketches (check out the spots where "mem_" is reassigned: it's
> allocating new on-heap memory that Druid doesn't know about) and
> DoublesSketchBuildBufferAggregator in Druid (check out the "sketches" map
> and "relocate" method in DoublesSketchBuildBufferAggregatorHelper: it's
> making sure to retain the references just in case one of them grew into the
> heap).
> 
> 2) Allocate some arbitrary amount of memory, but don't use it, use a
> technique like the one in DoublesSketchBuildBufferAggregatorHelper to
> simply maintain references to on-heap sketches. This would work but you
> would run the risk of running out of heap memory. (There's nothing that
> explicitly controls how much heap you'll use.) To a degree you can mitigate
> this by allocating a larger amount of unused off-heap memory via
> AggregatorFactory.getMaxIntermediateSize, which will get Druid to flush
> aggregation state more often (it does this when it runs out of off-heap
> buffer space), which will sort of limit how many sketches are going to be
> in flight at once. It's quite indirect but it's the best I can think of.
> 
> > I am also thinking of finishing the implementation by explicitly
> serializing the entire sketch on each update, but this would only be for
> experimentation as I doubt this is the intended behavior for
> implementations of BufferedAggregator.
> 
> You're right, that's not the intended sort of implementation. It works but
> it's usually too slow to be practical. (It incurs a full serialization
> roundtrip for every row.)
> 
> On Fri, Jul 23, 2021 at 12:18 PM Michael Schiff <michaelschiff@apache.org>
> wrote:
> 
> > I am looking into implementing a new Aggregator in the datasketches
> > extension using the ItemSketch in the frequencies package:
> >
> > https://datasketches.apache.org/docs/Frequency/FrequentItemsOverview.html
> >
> > https://github.com/apache/datasketches-java/tree/master/src/main/java/org/apache/datasketches/frequencies
> >
> > Ive started on a partial implementation here (still a WIP, lots of TODOs):
> >
> > https://github.com/apache/druid/compare/master...michaelschiff:fis-aggregator?expand=1
> >
> > From everything I've seen, it's critical that there is an efficient
> > implementation of BufferAggregator. The existing aggregators take advantage
> > of other sketch types providing "Direct" implementations that are
> > implemented directly against a ByteBuffer.  This leads to fairly
> > transparent implementation of BufferAggregator.  ItemSketch is able to
> > serialize itself and to wrap ByteBuffer for instantiation, but the actual
> > interactions are all on heap (core of the implementation is
> > https://github.com/apache/datasketches-java/blob/27ecce938555d731f29df97f12f4744a0efb663d/src/main/java/org/apache/datasketches/frequencies/ReversePurgeItemHashMap.java
> > ).
> >
> > Can anyone confirm that it is critical (i.e. Aggregator will not function)
> > to have an implementation of BufferAggregator? Assuming it is, we can begin
> > talking with the datasketches team about the possibility of a Direct
> > implementation.  I am also thinking of finishing the implementation by
> > explicitly serializing the entire sketch on each update, but this would
> > only be for experimentation as I doubt this is the intended behavior for
> > implementations of BufferedAggregator.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@druid.apache.org
> > For additional commands, e-mail: dev-help@druid.apache.org
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@druid.apache.org
For additional commands, e-mail: dev-help@druid.apache.org


Mime
View raw message