Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4ED3490D5 for ; Wed, 25 Apr 2012 06:08:41 +0000 (UTC) Received: (qmail 83328 invoked by uid 500); 25 Apr 2012 06:08:39 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 82903 invoked by uid 500); 25 Apr 2012 06:08:33 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 82877 invoked by uid 99); 25 Apr 2012 06:08:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Apr 2012 06:08:33 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of cryptofive@gmail.com designates 209.85.214.172 as permitted sender) Received: from [209.85.214.172] (HELO mail-ob0-f172.google.com) (209.85.214.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Apr 2012 06:08:25 +0000 Received: by obbwd20 with SMTP id wd20so2289584obb.31 for ; Tue, 24 Apr 2012 23:08:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=CBiNYNScEFi0cofCJtbGrwXU7tz6bdt7qP9iQM9dPmE=; b=keAIFRe5/uCU0Bjkwuq8PlFrVmY/5IwjVfQI4xwSzpC3NdMFGKkpLohHAt2b0mFEH0 zJvBCVdFm8ZZEeVHlQTbJyUVuoA+NS6EvU0UdUIp3/JAcI94ZhNormuqKnNhNojsY0NG S4j09pyTmXQaLlIWosAeJU/60IQN+I7dAtgtXY/mk+qlePSpRijaqf6JAniQP4zZ5Mbp F1wT4n7x02G5gs2/bctMCatkURu0DZWX+XR0f3lISx4uZrna8piN/z4Bpn+dDk8ZXOvC hqacLcHajjB8agT8dAAtgzAgiEantfu5xFt4nmWrukqVaWj/tg2olH5n3YFQFsQ11zOu v5LA== MIME-Version: 1.0 Received: by 10.182.89.36 with SMTP id bl4mr1758752obb.33.1335334083842; Tue, 24 Apr 2012 23:08:03 -0700 (PDT) Received: by 10.182.32.102 with HTTP; Tue, 24 Apr 2012 23:08:03 -0700 (PDT) In-Reply-To: <4F966B9C.2020606@gmail.com> References: <4F966B9C.2020606@gmail.com> Date: Tue, 24 Apr 2012 23:08:03 -0700 Message-ID: Subject: Re: Cassandra dying when gets many deletes From: crypto five To: Vitalii Tymchyshyn Cc: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=bcaec5555120b952b404be7ab1ea --bcaec5555120b952b404be7ab1ea Content-Type: text/plain; charset=KOI8-U Content-Transfer-Encoding: quoted-printable I agree with your observations. >From another hand I found that ColumnFamily.size() doesn't calculate object size correctly. It doesn't count two fields Objects sizes and returns 0 if there is no object in columns container. I increased initial size variable value to 24 which is size of two objects(I didn't now what's correct value), and cassandra started calculating live ratio correctly, increasing trhouhput value and flushing memtables. On Tue, Apr 24, 2012 at 2:00 AM, Vitalii Tymchyshyn wrote= : > ** > Hello. > > For me " there are no dirty column families" in your message tells it's > possibly the same problem. > The issue is that column families that gets full row deletes only do not > get ANY SINGLE dirty byte accounted and so can't be picked by flusher. An= y > ratio can't help simply because it is multiplied by 0. Check your cfstats= . > > 24.04.12 09:54, crypto five =CE=C1=D0=C9=D3=C1=D7(=CC=C1): > > Thank you Vitalii. > > Looking at the Jonathan's answer to your patch I think it's probably not > my case. I see that LiveRatio is calculated in my case, but calculations > look strange: > > WARN [MemoryMeter:1] 2012-04-23 23:29:48,430 Memtable.java (line 181) > setting live ratio to maximum of 64 instead of Infinity > INFO [MemoryMeter:1] 2012-04-23 23:29:48,432 Memtable.java (line 186) > CFS(Keyspace=3D'lexems', ColumnFamily=3D'countersCF') liveRatio is 64.0 > (just-counted was 64.0). calculation took 63355ms for 0 columns > > Looking at the comments in the code: "If it gets higher than 64 > something is probably broken.", looks like it's probably the problem. > Not sure how to investigate it. > > 2012/4/23 =F7=A6=D4=C1=CC=A6=CA =F4=C9=CD=DE=C9=DB=C9=CE > >> See https://issues.apache.org/jira/browse/CASSANDRA-3741 >> I did post a fix there that helped me. >> >> >> 2012/4/24 crypto five >> >>> Hi, >>> >>> I have 50 millions of rows in column family on 4G RAM box. I >>> allocatedf 2GB to cassandra. >>> I have program which is traversing this CF and cleaning some data there= , >>> it generates about 20k delete statements per second. >>> After about of 3 millions deletions cassandra stops responding to >>> queries: it doesn't react to CLI, nodetool etc. >>> I see in the logs that it tries to free some memory but can't even if I >>> wait whole day. >>> Also I see following in the logs: >>> >>> INFO [ScheduledTasks:1] 2012-04-23 18:38:13,333 StorageService.java >>> (line 2647) Unable to reduce heap usage since there are no dirty column >>> families >>> >>> When I am looking at memory dump I see that memory goes to >>> ConcurrentSkipListMap(10%), HeapByteBuffer(13%), DecoratedKey(6%), >>> int[](6%), BigInteger(8.2%), ConcurrentSkipListMap$HeadIndex(7.2%), >>> ColumnFamily(6.5%), ThreadSafeSortedColumns(13.7%), long[](5.9%). >>> >>> What can I do to make cassandra stop dying? >>> Why it can't free the memory? >>> Any ideas? >>> >>> Thank you. >>> >> >> >> >> -- >> Best regards, >> Vitalii Tymchyshyn >> > > > --bcaec5555120b952b404be7ab1ea Content-Type: text/html; charset=KOI8-U Content-Transfer-Encoding: quoted-printable
I agree with your observations.=9A
From another hand I found that ColumnFamily.size() doesn= 't calculate object size correctly. It doesn't count two fields Obj= ects sizes and returns 0 if there is no object in columns container.
I increased initial size variable value to 24 wh= ich is size of two objects(I didn't now what's correct value), and = cassandra started calculating live ratio correctly, increasing trhouhput va= lue and flushing memtables.=9A

On Tue, Apr 24, 2= 012 at 2:00 AM, Vitalii Tymchyshyn <tivv00@gmail.com> wrote:<= br>
=20 =20 =20
Hello.

For me " there are no dirty column families" in your message = tells it's possibly the same problem.
The issue is that column families that gets full row deletes only do not get ANY SINGLE dirty byte accounted and so can't be picked by flusher. Any ratio can't help simply because it is multiplied by 0. Check your cfstats.

24.04.12 09:54, crypto five =CE=C1=D0=C9=D3=C1=D7(=CC=C1):
Thank you Vitalii.=9A

Looking at the Jonathan's answer to your patch I think it's probably not my case. I see that LiveRatio is calculated in my case, but calculations look strange:

WARN [MemoryMeter:1] 2012-04-23 23:29:48,430 Memtable.java (line 181) setting live ratio to maximum of 64 instead of Infinity
=9AINFO [MemoryMeter:1] 2012-04-23 23:29:48,432 Memtable.java (line 186) CFS(Keyspace=3D'lexems', ColumnFamily=3D'countersCF') liveRatio is 64.0 (just-co= unted was 64.0). =9Acalculation took 63355ms for 0 columns

Looking at the comments in the code: "If it gets = higher than 64 something is probably broken.", looks like it's probably t= he problem.
Not sure how to investigate it.

2012/4/23 =F7=A6=D4=C1=CC=A6=CA =F4=C9= =CD=DE=C9=DB=C9=CE <tivv00@gmail.com>
I did post a fix there that helped me.


2012/4/24 crypto five <cry= ptofive@gmail.com>
Hi,

I have 50 millions of rows in column family on 4G RAM box. I allocatedf 2GB to cassandra.
I have program which is traversing this CF and cleaning some data there, it generates about 20k delete statements per second.
After about of 3 millions deletions cassandra stops responding to queries: it doesn't react t= o CLI, nodetool etc.
I see in the logs that it tries to free some memory but can't even if I wait whole day.=9A
Also I see following in =9Athe logs:

INFO [ScheduledTasks:1] 2012-04-23 18:38:13,333 StorageService.java (line 2647) Unable to reduce heap usage since there are no dirty column families

When I am looking at memory dump I see that memory goes to ConcurrentSkipListMap(10%), HeapByteBuffer(13%), DecoratedKey(6%), int[](6%), BigInteger(8.2%), ConcurrentSkipListMap$HeadIndex(7.2%), ColumnFamily(6.5%), ThreadSafeSortedColumns(13.7%), long[](5.9%).

What can I do to make cassandra stop dying?=9A
Why it can't free the memory?
Any ideas?

Thank you.



--
Best regards,
=9AVitalii Tymchyshyn



--bcaec5555120b952b404be7ab1ea--