Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@accumulo.apache.org
Received-SPF: softfail (nike.apache.org: transitioning domain of
 wilhelm.von.cloud@accumulo.net does not designate 209.85.219.41 as permitted
 sender)
MIME-Version: 1.0
In-Reply-To: <14781-1347687856-274275@sneakemail.com>
References: <14781-1347687856-274275@sneakemail.com>
Date: Sat, 15 Sep 2012 09:46:17 -0400
Message-ID: 
 <CAMz+DuvZxH2D+P5QLgePWGkAvC1dwXJrmDbVhHSkYYd-1yicvA@mail.gmail.com>
Subject: Re: sanity checking application WALogs make sense
From: William Slacum <wilhelm.von.cloud@accumulo.net>
To: user@accumulo.apache.org
Content-Type: multipart/alternative; boundary=e89a8fb1f350c1746f04c9bdc339

--e89a8fb1f350c1746f04c9bdc339
Content-Type: text/plain; charset=ISO-8859-1

I'm a bit confused as to what you mean "if an iterator goes down
mid-processing." If it goes down at all, then whatever scope it's running
in- minor compaction, major compaction and scan- will most likely go down
as well (unless your iterator eats an exception and ignores errors). A
WALog shouldn't be deleted if whatever you were trying to do failed.

On Sat, Sep 15, 2012 at 1:44 AM, Sukant Hajra <qn2b6c2b9w@snkmail.com>wrote:

> Hi guys,
>
> We've been slowing inching towards using iterators more effectively.  The
> typical use case of indexed docs fit one of our needs and we wrote a
> prototype
> for it.
>
> We've recently realized that iterators are not just read-only, and that we
> can
> get more data-local functionality by taking advantage of their ability to
> mutate data as well.  We've only begun to think more of how this may
> assist us.
> A /lot/ of our critical data-accesses are slightly complex, but local to
> one
> row.  We have billions of entities in our system, so a simple bijection of
> entities to rows works our really well for us with respect to iterators.
>
> Up to this point, we've had an planned architecture that uses Kestrel for
> WALog
> and a messaging system like Akka pipelining work.  Akka would help us
> manage
> flowing work from the user to the log and from the log to orchestrations of
> Accumulo intra-row reads and writes.  The log just helps us get some faster
> response time without sacrificing too much reliability.
>
> Recently someone asked why use our own WALog when Accumulo has one
> natively in
> HDFS.  My response has been that Accumulo's WALog is at a lower level of
> granularity of mutations.  We want reliable orchestrations of mutations.
>  Our
> orchestrations are idempotent, but we want something long the lines of
> at-least-once delivery for the entire orchestration.  If an iterator goes
> down
> mid-processing, I fear Accumulo's native WALog is insufficient to claim we
> have
> a reliable enough system.
>
> I could definitely go through source code to validate this opinion, but I
> thought I'd bounce this reasoning off the list first.
>
> Also, I'm sure we're not the only people using Accumulo in this way.
>  Please
> feel to advise us if anyone's got other ideas for an architecture or feels
> we're thinking about the problem backwards.
>
> Thanks for your input,
> Sukant
>

--e89a8fb1f350c1746f04c9bdc339
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I&#39;m a bit confused as to what you mean &quot;if an iterator goes down m=
id-processing.&quot; If it goes down at all, then whatever scope it&#39;s r=
unning in- minor compaction, major compaction and scan- will most likely go=
 down as well (unless your iterator eats an exception and ignores errors). =
A WALog shouldn&#39;t be deleted if whatever you were trying to do failed.=
=A0<div>
<br><div class=3D"gmail_quote">On Sat, Sep 15, 2012 at 1:44 AM, Sukant Hajr=
a <span dir=3D"ltr">&lt;<a href=3D"mailto:qn2b6c2b9w@snkmail.com" target=3D=
"_blank">qn2b6c2b9w@snkmail.com</a>&gt;</span> wrote:<br><blockquote class=
=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padd=
ing-left:1ex">
Hi guys,<br>
<br>
We&#39;ve been slowing inching towards using iterators more effectively. =
=A0The<br>
typical use case of indexed docs fit one of our needs and we wrote a protot=
ype<br>
for it.<br>
<br>
We&#39;ve recently realized that iterators are not just read-only, and that=
 we can<br>
get more data-local functionality by taking advantage of their ability to<b=
r>
mutate data as well. =A0We&#39;ve only begun to think more of how this may =
assist us.<br>
A /lot/ of our critical data-accesses are slightly complex, but local to on=
e<br>
row. =A0We have billions of entities in our system, so a simple bijection o=
f<br>
entities to rows works our really well for us with respect to iterators.<br=
>
<br>
Up to this point, we&#39;ve had an planned architecture that uses Kestrel f=
or WALog<br>
and a messaging system like Akka pipelining work. =A0Akka would help us man=
age<br>
flowing work from the user to the log and from the log to orchestrations of=
<br>
Accumulo intra-row reads and writes. =A0The log just helps us get some fast=
er<br>
response time without sacrificing too much reliability.<br>
<br>
Recently someone asked why use our own WALog when Accumulo has one natively=
 in<br>
HDFS. =A0My response has been that Accumulo&#39;s WALog is at a lower level=
 of<br>
granularity of mutations. =A0We want reliable orchestrations of mutations. =
=A0Our<br>
orchestrations are idempotent, but we want something long the lines of<br>
at-least-once delivery for the entire orchestration. =A0If an iterator goes=
 down<br>
mid-processing, I fear Accumulo&#39;s native WALog is insufficient to claim=
 we have<br>
a reliable enough system.<br>
<br>
I could definitely go through source code to validate this opinion, but I<b=
r>
thought I&#39;d bounce this reasoning off the list first.<br>
<br>
Also, I&#39;m sure we&#39;re not the only people using Accumulo in this way=
. =A0Please<br>
feel to advise us if anyone&#39;s got other ideas for an architecture or fe=
els<br>
we&#39;re thinking about the problem backwards.<br>
<br>
Thanks for your input,<br>
Sukant<br>
</blockquote></div><br></div>

--e89a8fb1f350c1746f04c9bdc339--