Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@accumulo.apache.org
Received-SPF: pass (athena.apache.org: local policy includes SPF record at
 spf.trusted-forwarder.org)
MIME-Version: 1.0
In-Reply-To: 
 <CAMz+DuvmmHegOn9EJeHR9H_rRpP50L2QZ53BbdruVO0pirArQw@mail.gmail.com>
References: 
 <CAMz+DuvmmHegOn9EJeHR9H_rRpP50L2QZ53BbdruVO0pirArQw@mail.gmail.com>
Date: Fri, 20 Jun 2014 10:32:56 -0400
Message-ID: 
 <CAGUtCHpHwo3NKG1A9nR1WAy1aQOjYCq_pzRgWW+34ZpSMgrsnw@mail.gmail.com>
Subject: Re: BatchWriter woes
From: Keith Turner <keith@deenlo.com>
To: user@accumulo.apache.org
Content-Type: multipart/alternative; boundary=001a1133f7709022ac04fc455ea2

--001a1133f7709022ac04fc455ea2
Content-Type: text/plain; charset=UTF-8

On Thu, Jun 19, 2014 at 11:57 PM, William Slacum <
wilhelm.von.cloud@accumulo.net> wrote:

> I'm finding some ingest jobs I have running in a bit of a sticky sitch:
>
> I have a MapReduce job that reads a table, transforms the entries, creates
> an inverted index, and writes out mutations to two tables. The cluster size
> is in the tens of nodes, and I usually have 32 mappers running.
>
> The batch writer configs are:
> - memory buffer: 128MB
> - max latency: 5 minutes
> - threads: 32
> - timeout: default Long.MAX_VALUE
>
> I know we're on Accumulo 1.5.0 and I believe using CDH 4.5.0, Zookeeper
> 3.3.6.
>
> I'm noticing an ingest pattern of usually ok rates for the cluster (in the
> 100K+ entries per second), but after some time they start to drop off to
> ~10K E/s. Sometimes this happens when a round of compactions kicks off
> (usually major, not minor), sometimes not. Eventually, the mappers will
> timeout. We have them set to timeout after 10 minutes of not reporting
> status.
>
> I added a bit of probing/profiling, and noticed that there's an
> exponential growth in per entry processing time in the mapper. They're of
> pretty uniform size, so there should not be much variance in the times. The
> times go from single milliseconds, to hundreds of milliseconds, to seconds,
> to minutes.
>
> If I jstack a mapper, it's sitting in TabletServerBatchWriter#waitRTE. It
> should only enter that method if the batch writer has (a) too much data
> buffered or (b) the user requested a flush. I'm inferring that (a) is the
> case, because there is no explicit TabletServerBatchWriter#flush() call.
>
> We did notice that there was a send thread trying to send to a dead
> server. We can't ssh to the IP it was trying to send to, and have verified
> manually that it's not listed in the current tablet servers. We did notice
> that the master log is reporting that a recovery on a WAL associated with
> that IP is under way. Looking back, the master had been reporting that
> message for about a day and a half. The message was similar to the one
> described in https://issues.apache.org/jira/browse/ACCUMULO-1364 . I do
> not know the significance of this as it relates to my jobs.
>

Do you think its trying to write to a half dead server?  Does that server
still have locations in the metadata table?


>
> I did some digging in TabletServerBatchWriter, and the only thing I can
> kind of see happening is that if SendTask#sendMutationsToTabletServer
> receives a TException, it rethrows it as an IOException, then SendTask#send
> will catch that exception and add the mutations to the failures collection.
> Since the timeout is Long.MAX_VALUE, I think it's possible this loop can
> continue forever or until some outside force kills the entire process.
>
> Does this seem coherent? Is there anything else that could cause this?
>
> I'm on the track of converting the code over to using bulk ingest, but I
> think there's an issue with a vanilla BatchWriter that I would just be
> getting around instead of actually fixing.
>
> Also, I'd love to provide logs, but there's a high amount of friction in
> getting them, so I won't be able to deliver on that front.
>

--001a1133f7709022ac04fc455ea2
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><div class=3D"gmail_extra"><br><br><div class=3D"gmail=
_quote">On Thu, Jun 19, 2014 at 11:57 PM, William Slacum <span dir=3D"ltr">=
&lt;<a href=3D"mailto:wilhelm.von.cloud@accumulo.net" target=3D"_blank">wil=
helm.von.cloud@accumulo.net</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">I&#39;m finding some ingest=
 jobs I have running in a bit of a sticky sitch:<div><br></div><div>I have =
a MapReduce job that reads a table, transforms the entries, creates an inve=
rted index, and writes out mutations to two tables. The cluster size is in =
the tens of nodes, and I usually have 32 mappers running.</div>

<div><br></div><div>The batch writer configs are:</div><div>- memory buffer=
: 128MB</div><div>- max latency: 5 minutes</div><div>- threads: 32</div><di=
v>- timeout: default Long.MAX_VALUE</div><div><br></div><div>I know we&#39;=
re on Accumulo 1.5.0 and I believe using CDH 4.5.0, Zookeeper 3.3.6.</div>

<div><br></div><div>I&#39;m noticing an ingest pattern of usually ok rates =
for the cluster (in the 100K+ entries per second), but after some time they=
 start to drop off to ~10K E/s. Sometimes this happens when a round of comp=
actions kicks off (usually major, not minor), sometimes not. Eventually, th=
e mappers will timeout. We have them set to timeout after 10 minutes of not=
 reporting status.</div>

<div><br></div><div>I added a bit of probing/profiling, and noticed that th=
ere&#39;s an exponential growth in per entry processing time in the mapper.=
 They&#39;re of pretty uniform size, so there should not be much variance i=
n the times. The times go from single milliseconds, to hundreds of millisec=
onds, to seconds, to minutes.</div>

<div><br></div><div>If I jstack a mapper, it&#39;s sitting in TabletServerB=
atchWriter#waitRTE. It should only enter that method if the batch writer ha=
s (a) too much data buffered or (b) the user requested a flush. I&#39;m inf=
erring that (a) is the case, because there is no explicit TabletServerBatch=
Writer#flush() call.</div>

<div><br></div><div>We did notice that there was a send thread trying to se=
nd to a dead server. We can&#39;t ssh to the IP it was trying to send to, a=
nd have verified manually that it&#39;s not listed in the current tablet se=
rvers. We did notice that the master log is reporting that a recovery on a =
WAL associated with that IP is under way. Looking back, the master had been=
 reporting that message for about a day and a half. The message was similar=
 to the one described in=C2=A0<a href=3D"https://issues.apache.org/jira/bro=
wse/ACCUMULO-1364" target=3D"_blank">https://issues.apache.org/jira/browse/=
ACCUMULO-1364</a> . I do not know the significance of this as it relates to=
 my jobs.=C2=A0</div>
</div></blockquote><div><br></div><div>Do you think its trying to write to =
a half dead server?=C2=A0 Does that server still have locations in the meta=
data table?<br></div><div>=C2=A0</div><blockquote class=3D"gmail_quote" sty=
le=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir=3D"ltr">
<div><br></div><div>I did some digging in TabletServerBatchWriter, and the =
only thing I can kind of see happening is that if SendTask#sendMutationsToT=
abletServer receives a TException, it rethrows it as an IOException, then S=
endTask#send will catch that exception and add the mutations to the failure=
s collection. Since the timeout is Long.MAX_VALUE, I think it&#39;s possibl=
e this loop can continue forever or until some outside force kills the enti=
re process.</div>

<div><br></div><div>Does this seem coherent? Is there anything else that co=
uld cause this?</div><div><br></div><div>I&#39;m on the track of converting=
 the code over to using bulk ingest, but I think there&#39;s an issue with =
a vanilla BatchWriter that I would just be getting around instead of actual=
ly fixing.</div>

<div><br></div><div>Also, I&#39;d love to provide logs, but there&#39;s a h=
igh amount of friction in getting them, so I won&#39;t be able to deliver o=
n that front.=C2=A0</div></div>
</blockquote></div><br></div></div>

--001a1133f7709022ac04fc455ea2--