Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of wav100@gmail.com designates
 209.85.213.172 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=VbK13isUMTeZ0GdUlz7iGsZSEeLfoGPdd/JQvBH8TOvhzpetxsZz9MDdE37qbRZ0z1
         5iQxcS21B/fU59A4+oBc8aS2mI8q41fWyl67nQLlevf2Bbo3TDxG7N/5XpZyVy3hGdI4
         /1wrBOG40tPIxlEGHjlDO4/sUYxqJHEc2ISyg=
MIME-Version: 1.0
In-Reply-To: <AANLkTikUNEdvgFgwyFtxP8ixLDk+05Lvuc1S4oby=M58@mail.gmail.com>
References: <AANLkTinU3v4vnvGWzTq_oDPpJFv7UZ-LHhnrPqtJ-nHn@mail.gmail.com>
	<AANLkTina-Ja1-FFX5JsBG7pDeoy80zmrvs86tU4mfW3Z@mail.gmail.com>
	<AANLkTi=DByTZW_ceK8_oQJRtLAL77zWQg-p8pmYwqBw+@mail.gmail.com>
	<AANLkTinqcJyTtqwOoNDr6UMMkgS5DKaKgoBau0AnsJ8G@mail.gmail.com>
	<AANLkTikbsjQXEZo+8_fabpMGBM70iwgkjuz8iYkhh0nU@mail.gmail.com>
	<AANLkTikdFWGP9xc97o21HQegGwLZvdusuKzCCPuMra73@mail.gmail.com>
	<AANLkTikUNEdvgFgwyFtxP8ixLDk+05Lvuc1S4oby=M58@mail.gmail.com>
Date: Fri, 20 Aug 2010 20:13:07 +0200
Message-ID: <AANLkTimoL3_+kVZmYqbdZ2_gZi3Xh+9SCP6EMQsidYpM@mail.gmail.com>
Subject: Re: Node OOM Problems
From: Wayne <wav100@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=000e0cd6aab633e957048e4540a8

--000e0cd6aab633e957048e4540a8
Content-Type: text/plain; charset=ISO-8859-1

I deleted ALL data and reset the nodes from scratch. There are no more large
rows in there. 8-9megs MAX across all nodes. This appears to be a new
problem. I restarted the node in question and it seems to be running fine,
but I had to run repair on it as it appears to be missing a lot of data.


On Fri, Aug 20, 2010 at 7:51 PM, Edward Capriolo <edlinuxguru@gmail.com>wrote:

> On Fri, Aug 20, 2010 at 1:17 PM, Wayne <wav100@gmail.com> wrote:
> > I turned off the creation of the secondary indexes which had the large
> rows
> > and all seemed good. Thank you for the help. I was getting
> > 60k+/writes/second on the 6 node cluster.
> >
> > Unfortunately again three hours later a node went down. I can not even
> look
> > at the logs when it started since they are gone/recycled due to millions
> of
> > message deserialization messages. What are these? The node had 12,098,067
> > pending message-deserializer-pool entries in tpstats. The node was up
> > according to some nodes and down according to others which made it
> flapping
> > and still trying to take requests. What is the log warning message
> > deserialization task dropped message? Why would a node have 12 million of
> > these?
> >
> >  WARN [MESSAGE-DESERIALIZER-POOL:1] 2010-08-20 16:57:02,602
> > MessageDeserializationTask.java (line 47) dropping message (1,078,378ms
> past
> > timeout)
> >  WARN [MESSAGE-DESERIALIZER-POOL:1] 2010-08-20 16:57:02,602
> > MessageDeserializationTask.java (line 47) dropping message (1,078,378ms
> past
> > timeout)
> >
> > I do not think this is a large row problem any more. All nodes show a max
> > row size around 8-9 megs.
> >
> > I looked at the munin charts and the disk IO seems to have spiked along
> with
> > compaction. Could compaction kicking in cause this? I have added the 3
> JVM
> > settings to make compaction a lower priority. Did this help cause this to
> > happen by slowing down and building up compaction on a heavily loaded
> > system?
> >
> > Thanks in advance for any help someone can provide.
> >
> >
> > On Fri, Aug 20, 2010 at 8:34 AM, Wayne <wav100@gmail.com> wrote:
> >>
> >> The NullPointerException does not crash the node. It only makes it
> flap/go
> >> down a for short period and then it comes back up. I do not see anything
> >> abnormal in the system log, only that single error in the cassandra.log.
> >>
> >>
> >> On Thu, Aug 19, 2010 at 11:42 PM, Peter Schuller
> >> <peter.schuller@infidyne.com> wrote:
> >>>
> >>> > What is my "live set"?
> >>>
> >>> Sorry; that meant the "set of data acually live (i.e., not garbage) in
> >>> the heap". In other words, the amount of memory truly "used".
> >>>
> >>> > Is the system CPU bound given the few statements
> >>> > below? This is from running 4 concurrent processes against the
> >>> > node...do I
> >>> > need to throttle back the concurrent read/writers?
> >>> >
> >>> > I do all reads/writes as Quorum. (Replication factor of 3).
> >>>
> >>> With quorom and 0.6.4 I don't think unthrottled writes are expected to
> >>> cause a problem.
> >>>
> >>> > The memtable threshold is the default of 256.
> >>> >
> >>> > All caching is turned off.
> >>> >
> >>> > The database is pretty small, maybe a few million keys (2-3) in 4
> CFs.
> >>> > The
> >>> > key size is pretty small. Some of the rows are pretty fat though
> >>> > (fatter
> >>> > than I thought). I am saving secondary indexes in separate CFs and
> >>> > those are
> >>> > the large rows that I think might be part of the problem. I will
> >>> > restart
> >>> > testing turning these off and see if I see any difference.
> >>> >
> >>> > Would an extra fat row explain repeated OOM crashes in a row? I have
> >>> > finally
> >>> > got the system to stabilize relatively and I even ran compaction on
> the
> >>> > bad
> >>> > node without a problem (still no row size stats).
> >>>
> >>> Based on what you've said so far, the large rows are the only thing I
> >>> would suspect may be the cause. With the amount of data and keys you
> >>> say you have, you should definitely not be having memory issues with
> >>> an 8 gig heap as a direct result of the data size/key count. A few
> >>> million keys is not a lot at all; I still claim you should be able to
> >>> handle hundreds of millions at least, from the perspective of bloom
> >>> filters and such.
> >>>
> >>> So your plan to try it without these large rows is probably a good
> >>> idea unless some else has a better idea.
> >>>
> >>> You may want to consider trying 0.7 betas too since it has removed the
> >>> limitation with respect to large rows, assuming you do in fact want
> >>> these large rows (see the CassandraLimitations wiki page that was
> >>> posted earlier in this thread).
> >>>
> >>> > I now have several other nodes flapping with the following single
> error
> >>> > in
> >>> > the cassandra.log
> >>> > Error: Exception thrown by the agent : java.lang.NullPointerException
> >>> >
> >>> > I assume this is an unrelated problem?
> >>>
> >>> Do you have a full stack trace?
> >>>
> >>> --
> >>> / Peter Schuller
> >>
> >
> >
>
> Just because you are no longer creating the big rows does not mean
> they are no longer effecting you. For example, periodic compaction may
> run on those keys. Did you delete the keys, and run a major compaction
> to clear the data and tombstones?
>

--000e0cd6aab633e957048e4540a8
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I deleted ALL data and reset the nodes from scratch. There are no more larg=
e rows in there. 8-9megs MAX across all nodes. This appears to be a new pro=
blem. I restarted the node in question and it seems to be running fine, but=
 I had to run repair on it as it appears to be missing a lot of data.<br>
<br><br><div class=3D"gmail_quote">On Fri, Aug 20, 2010 at 7:51 PM, Edward =
Capriolo <span dir=3D"ltr">&lt;<a href=3D"mailto:edlinuxguru@gmail.com">edl=
inuxguru@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quot=
e" style=3D"margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204,=
 204); padding-left: 1ex;">
<div><div></div><div class=3D"h5">On Fri, Aug 20, 2010 at 1:17 PM, Wayne &l=
t;<a href=3D"mailto:wav100@gmail.com">wav100@gmail.com</a>&gt; wrote:<br>
&gt; I turned off the creation of the secondary indexes which had the large=
 rows<br>
&gt; and all seemed good. Thank you for the help. I was getting<br>
&gt; 60k+/writes/second on the 6 node cluster.<br>
&gt;<br>
&gt; Unfortunately again three hours later a node went down. I can not even=
 look<br>
&gt; at the logs when it started since they are gone/recycled due to millio=
ns of<br>
&gt; message deserialization messages. What are these? The node had 12,098,=
067<br>
&gt; pending message-deserializer-pool entries in tpstats. The node was up<=
br>
&gt; according to some nodes and down according to others which made it fla=
pping<br>
&gt; and still trying to take requests. What is the log warning message<br>
&gt; deserialization task dropped message? Why would a node have 12 million=
 of<br>
&gt; these?<br>
&gt;<br>
&gt; =A0WARN [MESSAGE-DESERIALIZER-POOL:1] 2010-08-20 16:57:02,602<br>
&gt; MessageDeserializationTask.java (line 47) dropping message (1,078,378m=
s past<br>
&gt; timeout)<br>
&gt; =A0WARN [MESSAGE-DESERIALIZER-POOL:1] 2010-08-20 16:57:02,602<br>
&gt; MessageDeserializationTask.java (line 47) dropping message (1,078,378m=
s past<br>
&gt; timeout)<br>
&gt;<br>
&gt; I do not think this is a large row problem any more. All nodes show a =
max<br>
&gt; row size around 8-9 megs.<br>
&gt;<br>
&gt; I looked at the munin charts and the disk IO seems to have spiked alon=
g with<br>
&gt; compaction. Could compaction kicking in cause this? I have added the 3=
 JVM<br>
&gt; settings to make compaction a lower priority. Did this help cause this=
 to<br>
&gt; happen by slowing down and building up compaction on a heavily loaded<=
br>
&gt; system?<br>
&gt;<br>
&gt; Thanks in advance for any help someone can provide.<br>
&gt;<br>
&gt;<br>
&gt; On Fri, Aug 20, 2010 at 8:34 AM, Wayne &lt;<a href=3D"mailto:wav100@gm=
ail.com">wav100@gmail.com</a>&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt; The NullPointerException does not crash the node. It only makes it=
 flap/go<br>
&gt;&gt; down a for short period and then it comes back up. I do not see an=
ything<br>
&gt;&gt; abnormal in the system log, only that single error in the cassandr=
a.log.<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; On Thu, Aug 19, 2010 at 11:42 PM, Peter Schuller<br>
&gt;&gt; &lt;<a href=3D"mailto:peter.schuller@infidyne.com">peter.schuller@=
infidyne.com</a>&gt; wrote:<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; &gt; What is my &quot;live set&quot;?<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Sorry; that meant the &quot;set of data acually live (i.e., no=
t garbage) in<br>
&gt;&gt;&gt; the heap&quot;. In other words, the amount of memory truly &qu=
ot;used&quot;.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; &gt; Is the system CPU bound given the few statements<br>
&gt;&gt;&gt; &gt; below? This is from running 4 concurrent processes agains=
t the<br>
&gt;&gt;&gt; &gt; node...do I<br>
&gt;&gt;&gt; &gt; need to throttle back the concurrent read/writers?<br>
&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; I do all reads/writes as Quorum. (Replication factor of 3=
).<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; With quorom and 0.6.4 I don&#39;t think unthrottled writes are=
 expected to<br>
&gt;&gt;&gt; cause a problem.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; &gt; The memtable threshold is the default of 256.<br>
&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; All caching is turned off.<br>
&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; The database is pretty small, maybe a few million keys (2=
-3) in 4 CFs.<br>
&gt;&gt;&gt; &gt; The<br>
&gt;&gt;&gt; &gt; key size is pretty small. Some of the rows are pretty fat=
 though<br>
&gt;&gt;&gt; &gt; (fatter<br>
&gt;&gt;&gt; &gt; than I thought). I am saving secondary indexes in separat=
e CFs and<br>
&gt;&gt;&gt; &gt; those are<br>
&gt;&gt;&gt; &gt; the large rows that I think might be part of the problem.=
 I will<br>
&gt;&gt;&gt; &gt; restart<br>
&gt;&gt;&gt; &gt; testing turning these off and see if I see any difference=
.<br>
&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; Would an extra fat row explain repeated OOM crashes in a =
row? I have<br>
&gt;&gt;&gt; &gt; finally<br>
&gt;&gt;&gt; &gt; got the system to stabilize relatively and I even ran com=
paction on the<br>
&gt;&gt;&gt; &gt; bad<br>
&gt;&gt;&gt; &gt; node without a problem (still no row size stats).<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Based on what you&#39;ve said so far, the large rows are the o=
nly thing I<br>
&gt;&gt;&gt; would suspect may be the cause. With the amount of data and ke=
ys you<br>
&gt;&gt;&gt; say you have, you should definitely not be having memory issue=
s with<br>
&gt;&gt;&gt; an 8 gig heap as a direct result of the data size/key count. A=
 few<br>
&gt;&gt;&gt; million keys is not a lot at all; I still claim you should be =
able to<br>
&gt;&gt;&gt; handle hundreds of millions at least, from the perspective of =
bloom<br>
&gt;&gt;&gt; filters and such.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; So your plan to try it without these large rows is probably a =
good<br>
&gt;&gt;&gt; idea unless some else has a better idea.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; You may want to consider trying 0.7 betas too since it has rem=
oved the<br>
&gt;&gt;&gt; limitation with respect to large rows, assuming you do in fact=
 want<br>
&gt;&gt;&gt; these large rows (see the CassandraLimitations wiki page that =
was<br>
&gt;&gt;&gt; posted earlier in this thread).<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; &gt; I now have several other nodes flapping with the followin=
g single error<br>
&gt;&gt;&gt; &gt; in<br>
&gt;&gt;&gt; &gt; the cassandra.log<br>
&gt;&gt;&gt; &gt; Error: Exception thrown by the agent : java.lang.NullPoin=
terException<br>
&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; I assume this is an unrelated problem?<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Do you have a full stack trace?<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; --<br>
&gt;&gt;&gt; / Peter Schuller<br>
&gt;&gt;<br>
&gt;<br>
&gt;<br>
<br>
</div></div>Just because you are no longer creating the big rows does not m=
ean<br>
they are no longer effecting you. For example, periodic compaction may<br>
run on those keys. Did you delete the keys, and run a major compaction<br>
to clear the data and tombstones?<br>
</blockquote></div><br>

--000e0cd6aab633e957048e4540a8--