druid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Schiff <michaelsch...@apache.org>
Subject ItemsSketch Aggregator in druid-datasketches extension
Date Fri, 23 Jul 2021 19:18:16 GMT
I am looking into implementing a new Aggregator in the datasketches extension using the ItemSketch
in the frequencies package:


Ive started on a partial implementation here (still a WIP, lots of TODOs):

>From everything I've seen, it's critical that there is an efficient implementation of
BufferAggregator. The existing aggregators take advantage of other sketch types providing
"Direct" implementations that are implemented directly against a ByteBuffer.  This leads to
fairly transparent implementation of BufferAggregator.  ItemSketch is able to serialize itself
and to wrap ByteBuffer for instantiation, but the actual interactions are all on heap (core
of the implementation is https://github.com/apache/datasketches-java/blob/27ecce938555d731f29df97f12f4744a0efb663d/src/main/java/org/apache/datasketches/frequencies/ReversePurgeItemHashMap.java).

Can anyone confirm that it is critical (i.e. Aggregator will not function) to have an implementation
of BufferAggregator? Assuming it is, we can begin talking with the datasketches team about
the possibility of a Direct implementation.  I am also thinking of finishing the implementation
by explicitly serializing the entire sketch on each update, but this would only be for experimentation
as I doubt this is the intended behavior for implementations of BufferedAggregator.

To unsubscribe, e-mail: dev-unsubscribe@druid.apache.org
For additional commands, e-mail: dev-help@druid.apache.org

View raw message