accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: Scans during Compaction
Date Tue, 24 Feb 2015 15:16:13 GMT
Looking forward to it!!

Dylan Hutchison wrote:
> Good suggestion; I will follow up with a design document in the next few
> days.
> Creating idempotentency via indicator entries (in the column family,
> timestamp or something else) is one option to work in an iterator that
> should run once over a table's entries.  I think we may have the
> opportunity to solve a more general problem---scans merging data from
> multiple table sources with user-defined merge & compute functions---in
> addition to my use case by re-approaching the problem.  Think
> selective-scan->transform->join->transform->write-out.  Analytics on the
> server.
> My specific use case is table-table multiplication, treating table rows,
> column qualifiers and values as the components of a matrix.  We view a
> table as a /sparse/ matrix by treating non-present (row, colQ, value)
> entries as zeros in the matrix.  Accumulo offers an advantage when we
> only want to run /on selected ranges/ from the input tables, as opposed
> to running on whole tables where Yarn/Mapreduce may work better.  The
> table-table multiplication should run on tables in Accumulo, the result
> /persisting/ to Accumulo, so that we never need return values to the
> client.  We value /low latency/ over high throughput, so that we can
> perform multiplications interactively.  The user should have a way to
> /monitor/ multiplication progress, perhaps by a live count of the number
> of entries processed or (harder) a live sample of result entries from
> the multiplication.  The user should be able to /stop/ an operation
> midway once he decides enough entries processed.  In addition to
> interactivity, we may want to perform multiplications /in series/ and
> parallel.  They form base /building blocks/ for higher-level algorithms.
> I promise I will write these details up more formally, including how I
> made them work so far and putting them in more general context.  Will
> post in a separate thread.
> Regards,
> Dylan Hutchison
> On Mon, Feb 23, 2015 at 2:16 PM, Adam Fuchs <
> <>> wrote:
>     Dylan,
>     I think the way this is generally solved is by using an idempotent
>     iterator that can be applied at both full major compaction and query
>     scopes to give a consistent view. Aggregation, age-off filtering,
>     and all the other "standard" iterators have the property that you
>     can leave them in place and get a consistent answer even if they are
>     applied multiple times. Major compaction and query-time iterators
>     are even simpler than the general case, since you don't really need
>     to worry about partial views of the underlying data. In your case I
>     think you are trying to use an iterator that needs to be applied
>     exactly once to a complete stream of data (either at query time or
>     major compaction time). What we should probably do is look at
>     options for more generally supporting that type of iterator. You
>     could help us a ton by describing exactly what you want your
>     iterator to do, and we can all propose a few ideas for how this
>     might be implemented. Here are a couple off the top of my head:
>     1. If you can reform your iterator so that it is idempotent then you
>     can apply it liberally. This might be possible using some sort of
>     flag that the major compactor puts in the data and the query-time
>     iterator looks for to determine if the compaction has already
>     happened. We often use version numbers in column families to this
>     effect. Special row keys at the beginning of the tablet might also
>     be an option. This would be doable without changes to Accumulo.
>     2. We could build a mechanism into core accumulo that applies an
>     iterator with exactly once semantics, such that the user submits a
>     transformation as an iterator and it gets applied similarly to how
>     you described. The query-time reading of results of the major
>     compaction might be overkill, but that would be a possible
>     optimization that we could think about engineering in a second pass.
>     Adam
>     On Mon, Feb 23, 2015 at 1:42 PM, Dylan Hutchison
>     < <>> wrote:
>         Thanks Adam and Keith.
>         I see the following as a potential solution that achieves (1)
>         low latency for clients that want to see entries after an
>         iterator and (2) the entries from that iterator persisting in
>         the Accumulo table.
>          1. Start a major compaction in thread T1 of a client with the
>             iterator set, blocking until the compaction completes.
>          2. Start scanning in thread T2 of the client with the same
>             iterator now set at scan-time scope. Use an isolated scanner
>             to make sure we do not read the results of the major
>             compaction committing, though this is not full-proof due to
>             timing and that the isolated scanner is row-wise.
>          3. Eventually, T1 unblocks and signals that the compaction
>             completes.  T1 interrupts T2.
>          4. Thread T2 stops scanning, removes the scan-time iterator,
>             and starts scanning again at the point it last left off, now
>             seeing the results of the major compaction which already
>             passed through the iterator.
>         The whole scheme is only necessary if the client wants results
>         faster than the major compaction completes.  A disadvantage is
>         duplicated work -- the iterator runs at scan-time and at
>         compaction-time until the compaction finishes.  This may strain
>         server resources.
>         Will think about other schemes.  If only we could attach an
>         apply-once scan-time iterator, that also persists its results to
>         an Accumulo table in a streaming fashion.  Or on the flip side,
>         a one-time compaction iterator that streams results, such that
>         we could scan from them right away instead of needing to wait
>         for the entire compaction to complete.
>         Regards,
>         Dylan Hutchison
>         On Mon, Feb 23, 2015 at 12:48 PM, Adam Fuchs <
>         <>> wrote:
>             Dylan,
>             The effect of a major compaction is never seen in queries
>             before the major compaction completes. At the end of the
>             major compaction there is a multi-phase commit which
>             eventually replaces all of the old files with the new file.
>             At that point the major compaction will have completely
>             processed the given tablet's data (although other tablets
>             may not be synchronized). For long-running non-isolated
>             queries (more than a second or so) the iterator tree is
>             occasionally rebuilt and re-seeked. When it is rebuilt it
>             will use whatever is the latest file set, which will include
>             the results of a completed major compaction.
>             In your case #1 that's a tricky guarantee to make across a
>             whole tablet, but it can be made one row at a time by using
>             an isolated iterator.
>             To make your case #2 work, you probably will have to
>             implement some higher-level logic to only start your query
>             after the major compaction has completed, using an external
>             mechanism to track the completion of your transformation.
>             Adam
>             On Mon, Feb 23, 2015 at 12:35 PM, Dylan Hutchison
>             < <>> wrote:
>                 Hello all,
>                 When I initiate a full major compaction (with flushing
>                 turned on) manually via the Accumulo API
>                 <,,,%20java.util.List,%20boolean,%20boolean)>,
>                 how does the table appear to
>                  1. clients that started scanning the table before the
>                     major compaction began;
>                  2. clients that start scanning during the major compaction?
>                 I'm interested in the case where there is an iterator
>                 attached to the full major compaction that modifies
>                 entries (respecting sorted order of entries).
>                 The best possible answer for my use case, with case #2
>                 more important than case #1 and *low latency* more
>                 important than high throughput, is that
>                  1. clients that started scanning before the compaction
>                     began would not see entries altered by
>                     the compaction-time iterator;
>                  2. clients that start scanning during the major
>                     compaction stream back entries as they finish
>                     processing from the major compaction, such that the
>                     clients /only/ see entries that have passed through
>                     the compaction-time iterator.
>                 How accurate are these descriptions?  If #2 really were
>                 as I would like it to be, then a scan on the range
>                 (-inf,+inf) started after compaction would "monitor
>                 compaction progress," such that the first entry batch
>                 transmits to the scanner as soon as it is available from
>                 the major compaction, and the scanner finishes (receives
>                 all entries) exactly when the compaction finishes.  If
>                 this is not possible, I may make something to that
>                 effect by calling the blocking version of compact().
>                 Bonus: how does cancelCompaction()
>                 <>
>                 affect clients scanning in case #1 and case #2?
>                 Regards,
>                 Dylan Hutchison
>         --
> <>
> --
> <>

View raw message