accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dylan Hutchison <>
Subject Scans during Compaction
Date Mon, 23 Feb 2015 17:35:33 GMT
Hello all,

When I initiate a full major compaction (with flushing turned on) manually via
the Accumulo API
<,,, java.util.List,
boolean, boolean)>, how does the table appear to

   1. clients that started scanning the table before the major compaction
   2. clients that start scanning during the major compaction?

I'm interested in the case where there is an iterator attached to the full
major compaction that modifies entries (respecting sorted order of entries).

The best possible answer for my use case, with case #2 more important than
case #1 and *low latency* more important than high throughput, is that

   1. clients that started scanning before the compaction began would not
   see entries altered by the compaction-time iterator;
   2. clients that start scanning during the major compaction stream back
   entries as they finish processing from the major compaction, such that the
   clients *only* see entries that have passed through the compaction-time

How accurate are these descriptions?  If #2 really were as I would like it
to be, then a scan on the range (-inf,+inf) started after compaction would
"monitor compaction progress," such that the first entry batch transmits to
the scanner as soon as it is available from the major compaction, and the
scanner finishes (receives all entries) exactly when the compaction
finishes.  If this is not possible, I may make something to that effect by
calling the blocking version of compact().

Bonus: how does cancelCompaction()
affect clients scanning in case #1 and case #2?

Dylan Hutchison

View raw message