accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dylan Hutchison <>
Subject Re: Scans during Compaction
Date Tue, 24 Feb 2015 06:28:23 GMT
Good suggestion; I will follow up with a design document in the next few

Creating idempotentency via indicator entries (in the column family,
timestamp or something else) is one option to work in an iterator that
should run once over a table's entries.  I think we may have the
opportunity to solve a more general problem---scans merging data from
multiple table sources with user-defined merge & compute functions---in
addition to my use case by re-approaching the problem.  Think
selective-scan->transform->join->transform->write-out.  Analytics on the

My specific use case is table-table multiplication, treating table rows,
column qualifiers and values as the components of a matrix.  We view a
table as a *sparse* matrix by treating non-present (row, colQ, value)
entries as zeros in the matrix.  Accumulo offers an advantage when we only
want to run *on selected ranges* from the input tables, as opposed to
running on whole tables where Yarn/Mapreduce may work better.  The
table-table multiplication should run on tables in Accumulo, the result
*persisting* to Accumulo, so that we never need return values to the
client.  We value *low latency* over high throughput, so that we can
perform multiplications interactively.  The user should have a way to
*monitor* multiplication progress, perhaps by a live count of the number of
entries processed or (harder) a live sample of result entries from the
multiplication.  The user should be able to *stop* an operation midway once
he decides enough entries processed.  In addition to interactivity, we may
want to perform multiplications *in series* and parallel.  They form
base *building
blocks* for higher-level algorithms.

I promise I will write these details up more formally, including how I made
them work so far and putting them in more general context.  Will post in a
separate thread.

Dylan Hutchison

On Mon, Feb 23, 2015 at 2:16 PM, Adam Fuchs <> wrote:

> Dylan,
> I think the way this is generally solved is by using an idempotent
> iterator that can be applied at both full major compaction and query scopes
> to give a consistent view. Aggregation, age-off filtering, and all the
> other "standard" iterators have the property that you can leave them in
> place and get a consistent answer even if they are applied multiple times.
> Major compaction and query-time iterators are even simpler than the general
> case, since you don't really need to worry about partial views of the
> underlying data. In your case I think you are trying to use an iterator
> that needs to be applied exactly once to a complete stream of data (either
> at query time or major compaction time). What we should probably do is look
> at options for more generally supporting that type of iterator. You could
> help us a ton by describing exactly what you want your iterator to do, and
> we can all propose a few ideas for how this might be implemented. Here are
> a couple off the top of my head:
> 1. If you can reform your iterator so that it is idempotent then you can
> apply it liberally. This might be possible using some sort of flag that the
> major compactor puts in the data and the query-time iterator looks for to
> determine if the compaction has already happened. We often use version
> numbers in column families to this effect. Special row keys at the
> beginning of the tablet might also be an option. This would be doable
> without changes to Accumulo.
> 2. We could build a mechanism into core accumulo that applies an iterator
> with exactly once semantics, such that the user submits a transformation as
> an iterator and it gets applied similarly to how you described. The
> query-time reading of results of the major compaction might be overkill,
> but that would be a possible optimization that we could think about
> engineering in a second pass.
> Adam
> On Mon, Feb 23, 2015 at 1:42 PM, Dylan Hutchison <>
> wrote:
>> Thanks Adam and Keith.
>> I see the following as a potential solution that achieves (1) low latency
>> for clients that want to see entries after an iterator and (2) the entries
>> from that iterator persisting in the Accumulo table.
>>    1. Start a major compaction in thread T1 of a client with the
>>    iterator set, blocking until the compaction completes.
>>    2. Start scanning in thread T2 of the client with the same iterator
>>    now set at scan-time scope. Use an isolated scanner to make sure we do not
>>    read the results of the major compaction committing, though this is not
>>    full-proof due to timing and that the isolated scanner is row-wise.
>>    3. Eventually, T1 unblocks and signals that the compaction
>>    completes.  T1 interrupts T2.
>>    4. Thread T2 stops scanning, removes the scan-time iterator, and
>>    starts scanning again at the point it last left off, now seeing the results
>>    of the major compaction which already passed through the iterator.
>> The whole scheme is only necessary if the client wants results faster
>> than the major compaction completes.  A disadvantage is duplicated work --
>> the iterator runs at scan-time and at compaction-time until the compaction
>> finishes.  This may strain server resources.
>> Will think about other schemes.  If only we could attach an apply-once
>> scan-time iterator, that also persists its results to an Accumulo table in
>> a streaming fashion.  Or on the flip side, a one-time compaction iterator
>> that streams results, such that we could scan from them right away instead
>> of needing to wait for the entire compaction to complete.
>> Regards,
>> Dylan Hutchison
>> On Mon, Feb 23, 2015 at 12:48 PM, Adam Fuchs <> wrote:
>>> Dylan,
>>> The effect of a major compaction is never seen in queries before the
>>> major compaction completes. At the end of the major compaction there is a
>>> multi-phase commit which eventually replaces all of the old files with the
>>> new file. At that point the major compaction will have completely processed
>>> the given tablet's data (although other tablets may not be synchronized).
>>> For long-running non-isolated queries (more than a second or so) the
>>> iterator tree is occasionally rebuilt and re-seeked. When it is rebuilt it
>>> will use whatever is the latest file set, which will include the results of
>>> a completed major compaction.
>>> In your case #1 that's a tricky guarantee to make across a whole tablet,
>>> but it can be made one row at a time by using an isolated iterator.
>>> To make your case #2 work, you probably will have to implement some
>>> higher-level logic to only start your query after the major compaction has
>>> completed, using an external mechanism to track the completion of your
>>> transformation.
>>> Adam
>>> On Mon, Feb 23, 2015 at 12:35 PM, Dylan Hutchison <>
>>> wrote:
>>>> Hello all,
>>>> When I initiate a full major compaction (with flushing turned on)
>>>> manually via the Accumulo API
>>>> <,,,%20java.util.List,%20boolean,%20boolean)>,
>>>> how does the table appear to
>>>>    1. clients that started scanning the table before the major
>>>>    compaction began;
>>>>    2. clients that start scanning during the major compaction?
>>>> I'm interested in the case where there is an iterator attached to the
>>>> full major compaction that modifies entries (respecting sorted order of
>>>> entries).
>>>> The best possible answer for my use case, with case #2 more important
>>>> than case #1 and *low latency* more important than high throughput, is
>>>> that
>>>>    1. clients that started scanning before the compaction began would
>>>>    not see entries altered by the compaction-time iterator;
>>>>    2. clients that start scanning during the major compaction stream
>>>>    back entries as they finish processing from the major compaction, such
>>>>    the clients *only* see entries that have passed through the
>>>>    compaction-time iterator.
>>>> How accurate are these descriptions?  If #2 really were as I would like
>>>> it to be, then a scan on the range (-inf,+inf) started after compaction
>>>> would "monitor compaction progress," such that the first entry batch
>>>> transmits to the scanner as soon as it is available from the major
>>>> compaction, and the scanner finishes (receives all entries) exactly when
>>>> the compaction finishes.  If this is not possible, I may make something to
>>>> that effect by calling the blocking version of compact().
>>>> Bonus: how does cancelCompaction()
>>>> <>
>>>> affect clients scanning in case #1 and case #2?
>>>> Regards,
>>>> Dylan Hutchison
>> --


View raw message