Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 86CC4DFB2 for ; Sat, 15 Sep 2012 18:15:26 +0000 (UTC) Received: (qmail 33210 invoked by uid 500); 15 Sep 2012 18:15:26 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 33170 invoked by uid 500); 15 Sep 2012 18:15:26 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 33159 invoked by uid 500); 15 Sep 2012 18:15:26 -0000 Delivered-To: apmail-incubator-accumulo-user@incubator.apache.org Received: (qmail 33152 invoked by uid 99); 15 Sep 2012 18:15:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Sep 2012 18:15:26 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [38.113.6.65] (HELO sneak2.sneakemail.com) (38.113.6.65) by apache.org (qpsmtpd/0.29) with SMTP; Sat, 15 Sep 2012 18:15:17 +0000 Received: (qmail 5614 invoked from network); 15 Sep 2012 18:14:52 -0000 Received: from unknown (HELO localhost.localdomain) (192.168.0.1) by sneak2.sneakemail.com with SMTP; 15 Sep 2012 18:14:52 -0000 Received: from 209.85.214.171 by mail.sneakemail.com with SMTP; 15 Sep 2012 18:14:52 -0000 Received: (sneakemail censored 3321-1347732891-972412 #3); 15 Sep 2012 18:14:52 -0000 Received: (sneakemail censored 3321-1347732891-972412 #2); 15 Sep 2012 18:14:52 -0000 Received: (sneakemail censored 3321-1347732891-972412 #1); 15 Sep 2012 18:14:52 -0000 Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:from:to:subject:in-reply-to:references:date:message-id :user-agent:content-transfer-encoding; bh=nn6wztYnLxE23amplI9mlUO+ah7kfolJ650qbxmxHTk=; b=fvpkd7MGSBt8OxnRWyafTR8P6/Z3A48LrHn+4upuIlFkBktK0pIqcpUVQSTgiJQfPf 6yrzOkNGa0GZ6E7gQ71CavxS2udlC646Uk8RE+anJKzdlyGSkyxOPbp6PCgJc/NwhfmT ILsP4bXPNFMV6qDZtHt2vVF5qzTfzVcBHra/s0YT4pui0IFFMnqQa4K1SiM1CKUfsmjn QZQAPacVNvnDOVFU6Jv23l+NtICoD8wRASgw6/Dbe8JeClU05oC7L0XiRyXlriwcMGJU rhlNIOnNNqxJcjqJjtkc+4u00yBO0eVLyWMNErO8ldu3BWOZjOgRaLwPrjf2jwZZArDH cFHg== Content-Type: text/plain; charset=UTF-8 From: "Sukant Hajra" To: accumulo-user@incubator.apache.org Subject: Re: sanity checking application WALogs make sense In-Reply-To: References: <14781-1347687856-274275@sneakemail.com> Date: Sat, 15 Sep 2012 13:14:49 -0500 Message-ID: <3321-1347732891-972412@sneakemail.com> User-Agent: Sup/git Content-Transfer-Encoding: 8bit X-Mailer: Perl5 Mail::Internet v Excerpts from William Slacum's message of 2012-09-15 08:46:17 -0500: > > I'm a bit confused as to what you mean "if an iterator goes down > mid-processing." If it goes down at all, then whatever scope it's running in- > minor compaction, major compaction and scan- will most likely go down as well > (unless your iterator eats an exception and ignores errors). A WALog > shouldn't be deleted if whatever you were trying to do failed. I believe I've answered my own question after thinking about iterators more and looking at the code for some of the implementations. I was thinking about iterators "writing" changes to Accumulo using something like a BatchWriter. Now I'm coming to the conclusion that even if that were possible, it is not how iterators were designed, and very likely bad for data integrity. I don't feel that iterators should have any side-effects beyond scanning data through the source provided by the init() method. In this way, I'm beginning to think about iterators more purely functionally. Does that sound right? Or have people come up with iterator implementations with more side-effects? For instance, in one of my algorithms, authors might write conflicting data to a row that needs to be resolved. I feel I could install iterators at scan, minor compaction, and major compaction to perform this resolution (which happens to be a very simple idempotent operation). Sorry if none of this sounds like a concrete question. Some of what I'm looking for is conversation and validation in light of some limited local Accumulo expertise on my team. Has anyone thought about building up a small IRC community, say on #accumulo on Freenode? There's a nice #hbase channel there, but at this point, I think I'm past the point of asking Bigtable-general questions. -Sukant