Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 17409 invoked from network); 20 Aug 2010 18:13:37 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 20 Aug 2010 18:13:37 -0000 Received: (qmail 77858 invoked by uid 500); 20 Aug 2010 18:13:35 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 77809 invoked by uid 500); 20 Aug 2010 18:13:35 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 77800 invoked by uid 99); 20 Aug 2010 18:13:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Aug 2010 18:13:34 +0000 X-ASF-Spam-Status: No, hits=4.4 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of wav100@gmail.com designates 209.85.213.172 as permitted sender) Received: from [209.85.213.172] (HELO mail-yx0-f172.google.com) (209.85.213.172) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Aug 2010 18:13:29 +0000 Received: by yxk8 with SMTP id 8so1560743yxk.31 for ; Fri, 20 Aug 2010 11:13:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=HGVlPa+M5Z5q/HvyOsozkEInLtEsDUGOC38vWG+1TuM=; b=VgpfrbwhVYWD+atvJGhW6a54bDc1NwdkEr/rHLU1muB68UkJMKEvosnKEh1Dq8+oUZ gpiaJJ+mxfVFfGirkcn/LJwTN6pVFKT7TZY2/l2/jGb0SRaybcf3iy1idbJVHaPFMFmS bZEx5spQBYdDV3e5pv74NtWJ/eO57yaw5zxsE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=VbK13isUMTeZ0GdUlz7iGsZSEeLfoGPdd/JQvBH8TOvhzpetxsZz9MDdE37qbRZ0z1 5iQxcS21B/fU59A4+oBc8aS2mI8q41fWyl67nQLlevf2Bbo3TDxG7N/5XpZyVy3hGdI4 /1wrBOG40tPIxlEGHjlDO4/sUYxqJHEc2ISyg= MIME-Version: 1.0 Received: by 10.150.12.18 with SMTP id 18mr913187ybl.266.1282327987873; Fri, 20 Aug 2010 11:13:07 -0700 (PDT) Received: by 10.231.17.77 with HTTP; Fri, 20 Aug 2010 11:13:07 -0700 (PDT) In-Reply-To: References: Date: Fri, 20 Aug 2010 20:13:07 +0200 Message-ID: Subject: Re: Node OOM Problems From: Wayne To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=000e0cd6aab633e957048e4540a8 --000e0cd6aab633e957048e4540a8 Content-Type: text/plain; charset=ISO-8859-1 I deleted ALL data and reset the nodes from scratch. There are no more large rows in there. 8-9megs MAX across all nodes. This appears to be a new problem. I restarted the node in question and it seems to be running fine, but I had to run repair on it as it appears to be missing a lot of data. On Fri, Aug 20, 2010 at 7:51 PM, Edward Capriolo wrote: > On Fri, Aug 20, 2010 at 1:17 PM, Wayne wrote: > > I turned off the creation of the secondary indexes which had the large > rows > > and all seemed good. Thank you for the help. I was getting > > 60k+/writes/second on the 6 node cluster. > > > > Unfortunately again three hours later a node went down. I can not even > look > > at the logs when it started since they are gone/recycled due to millions > of > > message deserialization messages. What are these? The node had 12,098,067 > > pending message-deserializer-pool entries in tpstats. The node was up > > according to some nodes and down according to others which made it > flapping > > and still trying to take requests. What is the log warning message > > deserialization task dropped message? Why would a node have 12 million of > > these? > > > > WARN [MESSAGE-DESERIALIZER-POOL:1] 2010-08-20 16:57:02,602 > > MessageDeserializationTask.java (line 47) dropping message (1,078,378ms > past > > timeout) > > WARN [MESSAGE-DESERIALIZER-POOL:1] 2010-08-20 16:57:02,602 > > MessageDeserializationTask.java (line 47) dropping message (1,078,378ms > past > > timeout) > > > > I do not think this is a large row problem any more. All nodes show a max > > row size around 8-9 megs. > > > > I looked at the munin charts and the disk IO seems to have spiked along > with > > compaction. Could compaction kicking in cause this? I have added the 3 > JVM > > settings to make compaction a lower priority. Did this help cause this to > > happen by slowing down and building up compaction on a heavily loaded > > system? > > > > Thanks in advance for any help someone can provide. > > > > > > On Fri, Aug 20, 2010 at 8:34 AM, Wayne wrote: > >> > >> The NullPointerException does not crash the node. It only makes it > flap/go > >> down a for short period and then it comes back up. I do not see anything > >> abnormal in the system log, only that single error in the cassandra.log. > >> > >> > >> On Thu, Aug 19, 2010 at 11:42 PM, Peter Schuller > >> wrote: > >>> > >>> > What is my "live set"? > >>> > >>> Sorry; that meant the "set of data acually live (i.e., not garbage) in > >>> the heap". In other words, the amount of memory truly "used". > >>> > >>> > Is the system CPU bound given the few statements > >>> > below? This is from running 4 concurrent processes against the > >>> > node...do I > >>> > need to throttle back the concurrent read/writers? > >>> > > >>> > I do all reads/writes as Quorum. (Replication factor of 3). > >>> > >>> With quorom and 0.6.4 I don't think unthrottled writes are expected to > >>> cause a problem. > >>> > >>> > The memtable threshold is the default of 256. > >>> > > >>> > All caching is turned off. > >>> > > >>> > The database is pretty small, maybe a few million keys (2-3) in 4 > CFs. > >>> > The > >>> > key size is pretty small. Some of the rows are pretty fat though > >>> > (fatter > >>> > than I thought). I am saving secondary indexes in separate CFs and > >>> > those are > >>> > the large rows that I think might be part of the problem. I will > >>> > restart > >>> > testing turning these off and see if I see any difference. > >>> > > >>> > Would an extra fat row explain repeated OOM crashes in a row? I have > >>> > finally > >>> > got the system to stabilize relatively and I even ran compaction on > the > >>> > bad > >>> > node without a problem (still no row size stats). > >>> > >>> Based on what you've said so far, the large rows are the only thing I > >>> would suspect may be the cause. With the amount of data and keys you > >>> say you have, you should definitely not be having memory issues with > >>> an 8 gig heap as a direct result of the data size/key count. A few > >>> million keys is not a lot at all; I still claim you should be able to > >>> handle hundreds of millions at least, from the perspective of bloom > >>> filters and such. > >>> > >>> So your plan to try it without these large rows is probably a good > >>> idea unless some else has a better idea. > >>> > >>> You may want to consider trying 0.7 betas too since it has removed the > >>> limitation with respect to large rows, assuming you do in fact want > >>> these large rows (see the CassandraLimitations wiki page that was > >>> posted earlier in this thread). > >>> > >>> > I now have several other nodes flapping with the following single > error > >>> > in > >>> > the cassandra.log > >>> > Error: Exception thrown by the agent : java.lang.NullPointerException > >>> > > >>> > I assume this is an unrelated problem? > >>> > >>> Do you have a full stack trace? > >>> > >>> -- > >>> / Peter Schuller > >> > > > > > > Just because you are no longer creating the big rows does not mean > they are no longer effecting you. For example, periodic compaction may > run on those keys. Did you delete the keys, and run a major compaction > to clear the data and tombstones? > --000e0cd6aab633e957048e4540a8 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I deleted ALL data and reset the nodes from scratch. There are no more larg= e rows in there. 8-9megs MAX across all nodes. This appears to be a new pro= blem. I restarted the node in question and it seems to be running fine, but= I had to run repair on it as it appears to be missing a lot of data.


On Fri, Aug 20, 2010 at 7:51 PM, Edward = Capriolo <edl= inuxguru@gmail.com> wrote:
On Fri, Aug 20, 2010 at 1:17 PM, Wayne &l= t;wav100@gmail.com> wrote:
> I turned off the creation of the secondary indexes which had the large= rows
> and all seemed good. Thank you for the help. I was getting
> 60k+/writes/second on the 6 node cluster.
>
> Unfortunately again three hours later a node went down. I can not even= look
> at the logs when it started since they are gone/recycled due to millio= ns of
> message deserialization messages. What are these? The node had 12,098,= 067
> pending message-deserializer-pool entries in tpstats. The node was up<= br> > according to some nodes and down according to others which made it fla= pping
> and still trying to take requests. What is the log warning message
> deserialization task dropped message? Why would a node have 12 million= of
> these?
>
> =A0WARN [MESSAGE-DESERIALIZER-POOL:1] 2010-08-20 16:57:02,602
> MessageDeserializationTask.java (line 47) dropping message (1,078,378m= s past
> timeout)
> =A0WARN [MESSAGE-DESERIALIZER-POOL:1] 2010-08-20 16:57:02,602
> MessageDeserializationTask.java (line 47) dropping message (1,078,378m= s past
> timeout)
>
> I do not think this is a large row problem any more. All nodes show a = max
> row size around 8-9 megs.
>
> I looked at the munin charts and the disk IO seems to have spiked alon= g with
> compaction. Could compaction kicking in cause this? I have added the 3= JVM
> settings to make compaction a lower priority. Did this help cause this= to
> happen by slowing down and building up compaction on a heavily loaded<= br> > system?
>
> Thanks in advance for any help someone can provide.
>
>
> On Fri, Aug 20, 2010 at 8:34 AM, Wayne <wav100@gmail.com> wrote:
>>
>> The NullPointerException does not crash the node. It only makes it= flap/go
>> down a for short period and then it comes back up. I do not see an= ything
>> abnormal in the system log, only that single error in the cassandr= a.log.
>>
>>
>> On Thu, Aug 19, 2010 at 11:42 PM, Peter Schuller
>> <peter.schuller@= infidyne.com> wrote:
>>>
>>> > What is my "live set"?
>>>
>>> Sorry; that meant the "set of data acually live (i.e., no= t garbage) in
>>> the heap". In other words, the amount of memory truly &qu= ot;used".
>>>
>>> > Is the system CPU bound given the few statements
>>> > below? This is from running 4 concurrent processes agains= t the
>>> > node...do I
>>> > need to throttle back the concurrent read/writers?
>>> >
>>> > I do all reads/writes as Quorum. (Replication factor of 3= ).
>>>
>>> With quorom and 0.6.4 I don't think unthrottled writes are= expected to
>>> cause a problem.
>>>
>>> > The memtable threshold is the default of 256.
>>> >
>>> > All caching is turned off.
>>> >
>>> > The database is pretty small, maybe a few million keys (2= -3) in 4 CFs.
>>> > The
>>> > key size is pretty small. Some of the rows are pretty fat= though
>>> > (fatter
>>> > than I thought). I am saving secondary indexes in separat= e CFs and
>>> > those are
>>> > the large rows that I think might be part of the problem.= I will
>>> > restart
>>> > testing turning these off and see if I see any difference= .
>>> >
>>> > Would an extra fat row explain repeated OOM crashes in a = row? I have
>>> > finally
>>> > got the system to stabilize relatively and I even ran com= paction on the
>>> > bad
>>> > node without a problem (still no row size stats).
>>>
>>> Based on what you've said so far, the large rows are the o= nly thing I
>>> would suspect may be the cause. With the amount of data and ke= ys you
>>> say you have, you should definitely not be having memory issue= s with
>>> an 8 gig heap as a direct result of the data size/key count. A= few
>>> million keys is not a lot at all; I still claim you should be = able to
>>> handle hundreds of millions at least, from the perspective of = bloom
>>> filters and such.
>>>
>>> So your plan to try it without these large rows is probably a = good
>>> idea unless some else has a better idea.
>>>
>>> You may want to consider trying 0.7 betas too since it has rem= oved the
>>> limitation with respect to large rows, assuming you do in fact= want
>>> these large rows (see the CassandraLimitations wiki page that = was
>>> posted earlier in this thread).
>>>
>>> > I now have several other nodes flapping with the followin= g single error
>>> > in
>>> > the cassandra.log
>>> > Error: Exception thrown by the agent : java.lang.NullPoin= terException
>>> >
>>> > I assume this is an unrelated problem?
>>>
>>> Do you have a full stack trace?
>>>
>>> --
>>> / Peter Schuller
>>
>
>

Just because you are no longer creating the big rows does not m= ean
they are no longer effecting you. For example, periodic compaction may
run on those keys. Did you delete the keys, and run a major compaction
to clear the data and tombstones?

--000e0cd6aab633e957048e4540a8--