Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 40787173EC for ; Mon, 23 Feb 2015 19:24:49 +0000 (UTC) Received: (qmail 56717 invoked by uid 500); 23 Feb 2015 19:24:49 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 56665 invoked by uid 500); 23 Feb 2015 19:24:49 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 56647 invoked by uid 99); 23 Feb 2015 19:24:49 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Feb 2015 19:24:49 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of josh.elser@gmail.com designates 209.85.220.174 as permitted sender) Received: from [209.85.220.174] (HELO mail-vc0-f174.google.com) (209.85.220.174) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Feb 2015 19:24:23 +0000 Received: by mail-vc0-f174.google.com with SMTP id id10so8281638vcb.5 for ; Mon, 23 Feb 2015 11:23:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=IL2r+zuPv54c2eW7Ok0JUajRhtm2IfqNdjKgzF9/NNU=; b=QuOXeOj+0mHjhP0YZMWxEJj+wFmjvo9xgzgXXtllrEisuBO6lKQFgtg0392WxCehaj OqtiBCJxuOeGLPpqrpx6TFrDlrSj3JNVW8WitdKTB1p4zOS/isT+uP+wDR5XLM/QL94u +vj/e+kXD949bo2/zkwLTYpVZSIU/VyfrO/UuPxUZ2mGBMVoS+NqknuqWPjeiSRyzLK9 GEIGHZMRsfFkfLdkv9zcSz4Dtpn1B1L1PkzUUlUh4+uiFz8kDzOwPNCuN6ffZDmDJrgY csChWkwQpalMqHSGRFyIOKDy/5lP9VcKdohpSVYFeykKDusE69A6l3EuQhAMHEbdxaKf 54cA== X-Received: by 10.52.236.195 with SMTP id uw3mr11968613vdc.19.1424719416594; Mon, 23 Feb 2015 11:23:36 -0800 (PST) Received: from hw10447.local (pool-72-81-135-153.bltmmd.fios.verizon.net. [72.81.135.153]) by mx.google.com with ESMTPSA id ts2sm6828052vdb.11.2015.02.23.11.23.35 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 23 Feb 2015 11:23:35 -0800 (PST) Message-ID: <54EB7E35.3080808@gmail.com> Date: Mon, 23 Feb 2015 14:23:33 -0500 From: Josh Elser User-Agent: Postbox 3.0.11 (Macintosh/20140602) MIME-Version: 1.0 To: user@accumulo.apache.org Subject: Re: Scans during Compaction References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Is your iterator which is rewriting data during compaction idempotent? If you can apply the same function (the iterator) multiple times over the data (maybe only in the scan, maybe in the scan and by a major compaction), the only concern is doing a bit more work in the server. Given that you have many servers, that wouldn't be a big worry IMO. This wouldn't require you to try to do the trickiness you outlined, no? Dylan Hutchison wrote: > Thanks Adam and Keith. > > I see the following as a potential solution that achieves (1) low > latency for clients that want to see entries after an iterator and (2) > the entries from that iterator persisting in the Accumulo table. > > 1. Start a major compaction in thread T1 of a client with the iterator > set, blocking until the compaction completes. > 2. Start scanning in thread T2 of the client with the same iterator now > set at scan-time scope. Use an isolated scanner to make sure we do > not read the results of the major compaction committing, though this > is not full-proof due to timing and that the isolated scanner is > row-wise. > 3. Eventually, T1 unblocks and signals that the compaction completes. > T1 interrupts T2. > 4. Thread T2 stops scanning, removes the scan-time iterator, and starts > scanning again at the point it last left off, now seeing the results > of the major compaction which already passed through the iterator. > > The whole scheme is only necessary if the client wants results faster > than the major compaction completes. A disadvantage is duplicated work > -- the iterator runs at scan-time and at compaction-time until the > compaction finishes. This may strain server resources. > > Will think about other schemes. If only we could attach an apply-once > scan-time iterator, that also persists its results to an Accumulo table > in a streaming fashion. Or on the flip side, a one-time compaction > iterator that streams results, such that we could scan from them right > away instead of needing to wait for the entire compaction to complete. > > Regards, > Dylan Hutchison > > On Mon, Feb 23, 2015 at 12:48 PM, Adam Fuchs > wrote: > > Dylan, > > The effect of a major compaction is never seen in queries before the > major compaction completes. At the end of the major compaction there > is a multi-phase commit which eventually replaces all of the old > files with the new file. At that point the major compaction will > have completely processed the given tablet's data (although other > tablets may not be synchronized). For long-running non-isolated > queries (more than a second or so) the iterator tree is occasionally > rebuilt and re-seeked. When it is rebuilt it will use whatever is > the latest file set, which will include the results of a completed > major compaction. > > In your case #1 that's a tricky guarantee to make across a whole > tablet, but it can be made one row at a time by using an isolated > iterator. > > To make your case #2 work, you probably will have to implement some > higher-level logic to only start your query after the major > compaction has completed, using an external mechanism to track the > completion of your transformation. > > Adam > > > On Mon, Feb 23, 2015 at 12:35 PM, Dylan Hutchison > > wrote: > > Hello all, > > When I initiate a full major compaction (with flushing turned > on) manually via the Accumulo API > , > how does the table appear to > > 1. clients that started scanning the table before the major > compaction began; > 2. clients that start scanning during the major compaction? > > I'm interested in the case where there is an iterator attached > to the full major compaction that modifies entries (respecting > sorted order of entries). > > The best possible answer for my use case, with case #2 more > important than case #1 and *low latency* more important than > high throughput, is that > > 1. clients that started scanning before the compaction began > would not see entries altered by the compaction-time iterator; > 2. clients that start scanning during the major compaction > stream back entries as they finish processing from the major > compaction, such that the clients /only/ see entries that > have passed through the compaction-time iterator. > > How accurate are these descriptions? If #2 really were as I > would like it to be, then a scan on the range (-inf,+inf) > started after compaction would "monitor compaction progress," > such that the first entry batch transmits to the scanner as soon > as it is available from the major compaction, and the scanner > finishes (receives all entries) exactly when the compaction > finishes. If this is not possible, I may make something to that > effect by calling the blocking version of compact(). > > Bonus: how does cancelCompaction() > > affect clients scanning in case #1 and case #2? > > Regards, > Dylan Hutchison > > > > > > -- > www.cs.stevens.edu/~dhutchis