accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <>
Subject Re: Scans during Compaction
Date Mon, 23 Feb 2015 17:54:02 GMT
This may help.

On Mon, Feb 23, 2015 at 12:35 PM, Dylan Hutchison <>

> Hello all,
> When I initiate a full major compaction (with flushing turned on) manually via
> the Accumulo API
> <,,,%20java.util.List,%20boolean,%20boolean)>,
> how does the table appear to
>    1. clients that started scanning the table before the major compaction
>    began;
>    2. clients that start scanning during the major compaction?
> I'm interested in the case where there is an iterator attached to the full
> major compaction that modifies entries (respecting sorted order of entries).
> The best possible answer for my use case, with case #2 more important than
> case #1 and *low latency* more important than high throughput, is that
>    1. clients that started scanning before the compaction began would not
>    see entries altered by the compaction-time iterator;
>    2. clients that start scanning during the major compaction stream back
>    entries as they finish processing from the major compaction, such that the
>    clients *only* see entries that have passed through the
>    compaction-time iterator.
> How accurate are these descriptions?  If #2 really were as I would like it
> to be, then a scan on the range (-inf,+inf) started after compaction would
> "monitor compaction progress," such that the first entry batch transmits to
> the scanner as soon as it is available from the major compaction, and the
> scanner finishes (receives all entries) exactly when the compaction
> finishes.  If this is not possible, I may make something to that effect by
> calling the blocking version of compact().
> Bonus: how does cancelCompaction()
> <>
> affect clients scanning in case #1 and case #2?
> Regards,
> Dylan Hutchison

View raw message