Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 19EF3D0B9 for ; Sat, 15 Sep 2012 13:46:48 +0000 (UTC) Received: (qmail 11637 invoked by uid 500); 15 Sep 2012 13:46:47 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 11593 invoked by uid 500); 15 Sep 2012 13:46:47 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 11581 invoked by uid 99); 15 Sep 2012 13:46:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Sep 2012 13:46:47 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (nike.apache.org: transitioning domain of wilhelm.von.cloud@accumulo.net does not designate 209.85.219.41 as permitted sender) Received: from [209.85.219.41] (HELO mail-oa0-f41.google.com) (209.85.219.41) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Sep 2012 13:46:39 +0000 Received: by oagj6 with SMTP id j6so3932849oag.0 for ; Sat, 15 Sep 2012 06:46:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:in-reply-to:references:date :message-id:subject:from:to:content-type:x-gm-message-state; bh=BqGvM8tLeI3s+vGztKqBeFqkES0VvV/fAL3SsngzPBo=; b=Ng9wOldp46I0cIQGnNlLD3vvAVE7/giikg1NYCvDbXjSAlvDpvso3/8h8YnCfiyBHa aXF6eSaVgr2geCVEOKf0/onu2xy2UdmOSTbMmuElA//hw/8296oOhzpAQ+eqmoxrH7fQ 72h6oLHbRTxTBApXbFn275KLXYqVm9KvGd1d98vENnUmrrZv4HU01X6PDkdx48tcCkym 2qDnF+4aLJEjTKoWLZ94ph4DNIE9lJnNCpR3ucQ7xJTnx4jnyVasK/F868JqbLmW0ygG qT9cvq5NW8Tkxqgk79y7B8+V4P6tqsCRoAMF/5EPclpTWE1ht9Nh25FzCLEs12OqYMbi 2iFg== MIME-Version: 1.0 Received: by 10.60.13.37 with SMTP id e5mr7147372oec.98.1347716777079; Sat, 15 Sep 2012 06:46:17 -0700 (PDT) Received: by 10.60.60.104 with HTTP; Sat, 15 Sep 2012 06:46:17 -0700 (PDT) X-Originating-IP: [71.179.236.192] In-Reply-To: <14781-1347687856-274275@sneakemail.com> References: <14781-1347687856-274275@sneakemail.com> Date: Sat, 15 Sep 2012 09:46:17 -0400 Message-ID: Subject: Re: sanity checking application WALogs make sense From: William Slacum To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=e89a8fb1f350c1746f04c9bdc339 X-Gm-Message-State: ALoCoQn531cJwIDJHkZ7sNbEz2MaEniUQVi2ZHqZvIWZ05PCS986sCYL+Io7ZRpr9nTPeE44kN/j --e89a8fb1f350c1746f04c9bdc339 Content-Type: text/plain; charset=ISO-8859-1 I'm a bit confused as to what you mean "if an iterator goes down mid-processing." If it goes down at all, then whatever scope it's running in- minor compaction, major compaction and scan- will most likely go down as well (unless your iterator eats an exception and ignores errors). A WALog shouldn't be deleted if whatever you were trying to do failed. On Sat, Sep 15, 2012 at 1:44 AM, Sukant Hajra wrote: > Hi guys, > > We've been slowing inching towards using iterators more effectively. The > typical use case of indexed docs fit one of our needs and we wrote a > prototype > for it. > > We've recently realized that iterators are not just read-only, and that we > can > get more data-local functionality by taking advantage of their ability to > mutate data as well. We've only begun to think more of how this may > assist us. > A /lot/ of our critical data-accesses are slightly complex, but local to > one > row. We have billions of entities in our system, so a simple bijection of > entities to rows works our really well for us with respect to iterators. > > Up to this point, we've had an planned architecture that uses Kestrel for > WALog > and a messaging system like Akka pipelining work. Akka would help us > manage > flowing work from the user to the log and from the log to orchestrations of > Accumulo intra-row reads and writes. The log just helps us get some faster > response time without sacrificing too much reliability. > > Recently someone asked why use our own WALog when Accumulo has one > natively in > HDFS. My response has been that Accumulo's WALog is at a lower level of > granularity of mutations. We want reliable orchestrations of mutations. > Our > orchestrations are idempotent, but we want something long the lines of > at-least-once delivery for the entire orchestration. If an iterator goes > down > mid-processing, I fear Accumulo's native WALog is insufficient to claim we > have > a reliable enough system. > > I could definitely go through source code to validate this opinion, but I > thought I'd bounce this reasoning off the list first. > > Also, I'm sure we're not the only people using Accumulo in this way. > Please > feel to advise us if anyone's got other ideas for an architecture or feels > we're thinking about the problem backwards. > > Thanks for your input, > Sukant > --e89a8fb1f350c1746f04c9bdc339 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I'm a bit confused as to what you mean "if an iterator goes down m= id-processing." If it goes down at all, then whatever scope it's r= unning in- minor compaction, major compaction and scan- will most likely go= down as well (unless your iterator eats an exception and ignores errors). = A WALog shouldn't be deleted if whatever you were trying to do failed.= =A0

On Sat, Sep 15, 2012 at 1:44 AM, Sukant Hajr= a <qn2b6c2b9w@snkmail.com> wrote:
Hi guys,

We've been slowing inching towards using iterators more effectively. = =A0The
typical use case of indexed docs fit one of our needs and we wrote a protot= ype
for it.

We've recently realized that iterators are not just read-only, and that= we can
get more data-local functionality by taking advantage of their ability to mutate data as well. =A0We've only begun to think more of how this may = assist us.
A /lot/ of our critical data-accesses are slightly complex, but local to on= e
row. =A0We have billions of entities in our system, so a simple bijection o= f
entities to rows works our really well for us with respect to iterators.
Up to this point, we've had an planned architecture that uses Kestrel f= or WALog
and a messaging system like Akka pipelining work. =A0Akka would help us man= age
flowing work from the user to the log and from the log to orchestrations of=
Accumulo intra-row reads and writes. =A0The log just helps us get some fast= er
response time without sacrificing too much reliability.

Recently someone asked why use our own WALog when Accumulo has one natively= in
HDFS. =A0My response has been that Accumulo's WALog is at a lower level= of
granularity of mutations. =A0We want reliable orchestrations of mutations. = =A0Our
orchestrations are idempotent, but we want something long the lines of
at-least-once delivery for the entire orchestration. =A0If an iterator goes= down
mid-processing, I fear Accumulo's native WALog is insufficient to claim= we have
a reliable enough system.

I could definitely go through source code to validate this opinion, but I thought I'd bounce this reasoning off the list first.

Also, I'm sure we're not the only people using Accumulo in this way= . =A0Please
feel to advise us if anyone's got other ideas for an architecture or fe= els
we're thinking about the problem backwards.

Thanks for your input,
Sukant

--e89a8fb1f350c1746f04c9bdc339--