Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@accumulo.apache.org
MIME-Version: 1.0
Reply-To: vines@apache.org
In-Reply-To: <57754A39E408FB4994D100B1F4409AD80BE71DC0@HDXDSP33.us.lmco.com>
References: <57754A39E408FB4994D100B1F4409AD80BE71BC3@HDXDSP33.us.lmco.com>
 <CADxc9Bn-Wwx04tf=w1rJDP_ssij3MOPwV57FVuhqybD8XpsxZw@mail.gmail.com>
 <57754A39E408FB4994D100B1F4409AD80BE71DC0@HDXDSP33.us.lmco.com>
From: John Vines <vines@apache.org>
Date: Fri, 21 Sep 2012 10:25:57 -0400
Message-ID: 
 <CADczPYQQSqXkNb6Ret7ugxyZRPdNDMrBzy4kGBe56vrZHhBMgA@mail.gmail.com>
Subject: Re: EXTERNAL: Re: Failing Tablet Servers
To: user@accumulo.apache.org
Content-Type: multipart/alternative; boundary=bcaec51d2a38ed609304ca370509

--bcaec51d2a38ed609304ca370509
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

memory.maps is what defines the size of the in memory map. When using
native maps, that space does not come out of the heap size. But when using
non-native maps, it comes out of the heap space.

I think the issue Eric is trying to hit at is the fickleness of the java
garbage collector. When you give a process that much heap, that's so much
more data you can hold before you need to garbage collect. However, that
also means when it does garbage collect, it's collecting a LOT more, which
can result is poor performance.

John

On Fri, Sep 21, 2012 at 10:12 AM, Cardon, Tejay E
<tejay.e.cardon@lmco.com>wrote:

>  Jim, Eric, and Adam,****
>
> Thanks.  It sounds like you=92re all saying the same thing.  Originally I
> was doing each key/value as its own mutation, and it was blowing up much
> faster (probably due to the volume/overhead of the mutation objects
> themselves.  I=92ll try refactoring to break them up into something
> in-between.  My keys are small (<25 Bytes), and my values are empty, but
> I=92ll aim for ~1,000 key/values per mutation and see how that works out =
for
> me.****
>
> ** **
>
> Eric,****
>
> I was under the impression that the memory.maps setting was not very
> important when using native maps.  Apparently I=92m mistaken there.  What
> does this setting control when in a native map setting?  And, in general,
> what=92s the proper balance between tserver_opts and tserver.memory.maps?=
***
> *
>
> ** **
>
> With regards to the =93Finished gathering information from 24 servers in
> 27.45 seconds=94  Do you have any recommendations for how to chase down t=
he
> bottleneck?  I=92m pretty sure I=92m having GC issues, but I=92m not sure=
 what is
> causing them on the server side.  I=92m sending a fairly small number of =
very
> large mutation objects, which I=92d expect to be a moderate problem for t=
he
> GC, but not a huge one..****
>
> ** **
>
> Thanks again to everyone for being so responsive and helpful.****
>
> ** **
>
> Tejay Cardon****
>
> ** **
>
> ** **
>
> *From:* Eric Newton [mailto:eric.newton@gmail.com]
> *Sent:* Friday, September 21, 2012 8:03 AM
>
> *To:* user@accumulo.apache.org
> *Subject:* EXTERNAL: Re: Failing Tablet Servers****
>
>  ** **
>
> A few items noted from your logs:****
>
> ** **
>
> tserver.memory.maps.max =3D 1G****
>
>  ** **
>
> If you are giving your processes 10G, you might want to make the map
> larger, say 6G, and then reduce the JVM by 6G.****
>
> ** **
>
> Write-Ahead Log recovery complete for rz<;zw=3D=3D (8 mutations applied,
> 8000000 entries created)****
>
>  ** **
>
> You are creating rows with 1M columns.  This is ok, but you might want to
> write them out more incrementally.****
>
> ** **
>
> WARN : Running low on memory****
>
>  ** **
>
> That's pretty self-explanatory.  I'm guessing that the very large
> mutations are causing the tablet servers to run out of memory before they
> are held waiting for minor compactions.****
>
> ** **
>
> Finished gathering information from 24 servers in 27.45 seconds****
>
>  ** **
>
> Something is running slow, probably due to GC thrashing.****
>
> ** **
>
> WARN : Lost servers [10.1.24.69:9997[139d46130344b98]]****
>
>  ** **
>
> And there's a server crashing, probably due to an OOM condition.****
>
> ** **
>
> Send smaller mutations.  Maybe keep it to 200K column updates.  You can
> still have 1M wide rows, just send 5 mutations.****
>
> ** **
>
> -Eric****
>
> ** **
>
> On Thu, Sep 20, 2012 at 5:05 PM, Cardon, Tejay E <tejay.e.cardon@lmco.com=
>
> wrote:****
>
> I=92m seeing some strange behavior on a moderate (30 node) cluster.  I=92=
ve
> got 27 tablet servers on large dell servers with 30GB of memory each.  I=
=92ve
> set the TServer_OPTS to give them each 10G of memory.  I=92m running an
> ingest process that uses AccumuloInputFormat in a MapReduce job to write
> 1,000 rows with each row containing ~1,000,000 columns in 160,000
> families.  The MapReduce initially runs quite quickly and I can see the
> ingest rate peak on the  monitor page.  However, after about 30 seconds o=
f
> high ingest, the ingest falls to 0.  It then stalls out and my map task a=
re
> eventually killed.  In the end, the map/reduce fails and I usually end up
> with between 3 and 7 of my Tservers dead.****
>
>  ****
>
> Inspecting the tserver.err logs shows nothing, even on the nodes that
> fail.  The tserver.out log shows a java OutOfMemoryError, and nothing
> else.  I=92ve included a zip with the logs from one of the failed tserver=
s
> and a second one with the logs from the master.  Other than the out of
> memory, I=92m not seeing anything that stands out to me.****
>
>  ****
>
> If I reduce the data size to only 100,000 columns, rather than 1,000,000,
> the process takes about 4 minutes and completes without incident.****
>
>  ****
>
> Am I just ingesting too quickly?****
>
>  ****
>
> Thanks,****
>
> Tejay Cardon****
>
> ** **
>

--bcaec51d2a38ed609304ca370509
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

memory.maps is what defines the size of the in memory map. When using nativ=
e maps, that space does not come out of the heap size. But when using non-n=
ative maps, it comes out of the heap space.<br><br><div style=3D"text-align=
:left">

I think the issue Eric is trying to hit at is the fickleness of the java ga=
rbage collector. When you give a process that much heap, that&#39;s so much=
 more data you can hold before you need to garbage collect. However, that a=
lso means when it does garbage collect, it&#39;s collecting a LOT more, whi=
ch can result is poor performance.<br>

<br>John<br></div><br><div class=3D"gmail_quote">On Fri, Sep 21, 2012 at 10=
:12 AM, Cardon, Tejay E <span dir=3D"ltr">&lt;<a href=3D"mailto:tejay.e.car=
don@lmco.com" target=3D"_blank">tejay.e.cardon@lmco.com</a>&gt;</span> wrot=
e:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">


<div link=3D"blue" vlink=3D"purple" lang=3D"EN-US">
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d">Jim, Eric, and Adam,<u></=
u><u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d">Thanks.=A0 It sounds like=
 you=92re all saying the same thing.=A0 Originally I was doing each key/val=
ue as its own mutation, and it was blowing up much faster (probably
 due to the volume/overhead of the mutation objects themselves.=A0 I=92ll t=
ry refactoring to break them up into something in-between.=A0 My keys are s=
mall (&lt;25 Bytes), and my values are empty, but I=92ll aim for ~1,000 key=
/values per mutation and see how that works
 out for me.<u></u><u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u>=A0<u></u></span><=
/p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d">Eric,<u></u><u></u></span=
></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d">I was under the impressio=
n that the memory.maps setting was not very important when using native map=
s.=A0 Apparently I=92m mistaken there.=A0 What does this setting
 control when in a native map setting?=A0 And, in general, what=92s the pro=
per balance between tserver_opts and tserver.memory.maps?<u></u><u></u></sp=
an></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u>=A0<u></u></span><=
/p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d">With regards to the =93</=
span>Finished gathering information from 24 servers in 27.45 seconds<span s=
tyle=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&q=
uot;;color:#1f497d">=94
 =A0Do you have any recommendations for how to chase down the bottleneck?=
=A0 I=92m pretty sure I=92m having GC issues, but I=92m not sure what is ca=
using them on the server side.=A0 I=92m sending a fairly small number of ve=
ry large mutation objects, which I=92d expect to be
 a moderate problem for the GC, but not a huge one..<u></u><u></u></span></=
p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u>=A0<u></u></span><=
/p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d">Thanks again to everyone =
for being so responsive and helpful.<u></u><u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u>=A0<u></u></span><=
/p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d">Tejay Cardon<u></u><u></u=
></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u>=A0<u></u></span><=
/p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u>=A0<u></u></span><=
/p>
<div style=3D"border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0in =
0in 0in">
<p class=3D"MsoNormal"><b><span style=3D"font-size:10.0pt;font-family:&quot=
;Tahoma&quot;,&quot;sans-serif&quot;">From:</span></b><span style=3D"font-s=
ize:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;"> Eric New=
ton [mailto:<a href=3D"mailto:eric.newton@gmail.com" target=3D"_blank">eric=
.newton@gmail.com</a>]
<br>
<b>Sent:</b> Friday, September 21, 2012 8:03 AM</span></p><div class=3D"im"=
><br>
<b>To:</b> <a href=3D"mailto:user@accumulo.apache.org" target=3D"_blank">us=
er@accumulo.apache.org</a><br>
<b>Subject:</b> EXTERNAL: Re: Failing Tablet Servers<u></u><u></u></div><p>=
</p>
</div>
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
<p class=3D"MsoNormal">A few items noted from your logs:<u></u><u></u></p>
<div>
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
</div>
<blockquote style=3D"border:none;border-left:solid #cccccc 1.0pt;padding:0i=
n 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<p class=3D"MsoNormal">tserver.memory.maps.max =3D 1G<u></u><u></u></p>
</blockquote>
<div>
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">If you are giving your processes 10G, you might want=
 to make the map larger, say 6G, and then reduce the JVM by 6G.<u></u><u></=
u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
</div>
<div>
<blockquote style=3D"border:none;border-left:solid #cccccc 1.0pt;padding:0i=
n 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<p class=3D"MsoNormal">Write-Ahead Log recovery complete for rz&lt;;zw=3D=
=3D (8 mutations applied, 8000000 entries created)<u></u><u></u></p>
</blockquote>
<div>
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">You are creating rows with 1M columns. =A0This is ok=
, but you might want to write them out more incrementally.<u></u><u></u></p=
>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
</div>
<blockquote style=3D"border:none;border-left:solid #cccccc 1.0pt;padding:0i=
n 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<p class=3D"MsoNormal">WARN : Running low on memory<u></u><u></u></p>
</blockquote>
<div>
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">That&#39;s pretty self-explanatory. =A0I&#39;m guess=
ing that the very large mutations are causing the tablet servers to run out=
 of memory before they are held waiting for minor compactions.<u></u><u></u=
></p>


</div>
<div>
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
</div>
<blockquote style=3D"border:none;border-left:solid #cccccc 1.0pt;padding:0i=
n 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<p class=3D"MsoNormal">Finished gathering information from 24 servers in 27=
.45 seconds<u></u><u></u></p>
</blockquote>
<div>
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Something is running slow, probably due to GC thrash=
ing.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
</div>
<blockquote style=3D"border:none;border-left:solid #cccccc 1.0pt;padding:0i=
n 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<p class=3D"MsoNormal">WARN : Lost servers [10.1.24.69:9997[139d46130344b98=
]]<u></u><u></u></p>
</blockquote>
<div>
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">And there&#39;s a server crashing, probably due to a=
n OOM condition.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Send smaller mutations. =A0Maybe keep it to 200K col=
umn updates. =A0You can still have 1M wide rows, just send 5 mutations.<u><=
/u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">-Eric<u></u><u></u></p>
</div><div class=3D"im">
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
<div>
<p class=3D"MsoNormal">On Thu, Sep 20, 2012 at 5:05 PM, Cardon, Tejay E &lt=
;<a href=3D"mailto:tejay.e.cardon@lmco.com" target=3D"_blank">tejay.e.cardo=
n@lmco.com</a>&gt; wrote:<u></u><u></u></p>
<div>
<div>
<p class=3D"MsoNormal">I=92m seeing some strange behavior on a moderate (30=
 node) cluster.=A0 I=92ve got 27 tablet servers on large dell servers with =
30GB of memory each.=A0 I=92ve set the TServer_OPTS to give
 them each 10G of memory.=A0 I=92m running an ingest process that uses Accu=
muloInputFormat in a MapReduce job to write 1,000 rows with each row contai=
ning ~1,000,000 columns in 160,000 families.=A0 The MapReduce initially run=
s quite quickly and I can see the ingest
 rate peak on the=A0 monitor page.=A0 However, after about 30 seconds of hi=
gh ingest, the ingest falls to 0.=A0 It then stalls out and my map task are=
 eventually killed. =A0In the end, the map/reduce fails and I usually end u=
p with between 3 and 7 of my Tservers dead.<u></u><u></u></p>


<p class=3D"MsoNormal">=A0<u></u><u></u></p>
<p class=3D"MsoNormal">Inspecting the tserver.err logs shows nothing, even =
on the nodes that fail.=A0 The tserver.out log shows a java OutOfMemoryErro=
r, and nothing else.=A0 I=92ve included a zip with the logs
 from one of the failed tservers and a second one with the logs from the ma=
ster.=A0 Other than the out of memory, I=92m not seeing anything that stand=
s out to me.<u></u><u></u></p>
<p class=3D"MsoNormal">=A0<u></u><u></u></p>
<p class=3D"MsoNormal">If I reduce the data size to only 100,000 columns, r=
ather than 1,000,000, the process takes about 4 minutes and completes witho=
ut incident.<u></u><u></u></p>
<p class=3D"MsoNormal">=A0<u></u><u></u></p>
<p class=3D"MsoNormal">Am I just ingesting too quickly?<u></u><u></u></p>
<p class=3D"MsoNormal">=A0<u></u><u></u></p>
<p class=3D"MsoNormal">Thanks,<u></u><u></u></p>
<p class=3D"MsoNormal">Tejay Cardon<u></u><u></u></p>
</div>
</div>
</div>
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
</div></div>
</div>
</div>

</blockquote></div><br>

--bcaec51d2a38ed609304ca370509--