Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 73939D8D5 for ; Tue, 18 Sep 2012 03:28:03 +0000 (UTC) Received: (qmail 26051 invoked by uid 500); 18 Sep 2012 03:28:01 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 25576 invoked by uid 500); 18 Sep 2012 03:27:59 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 25542 invoked by uid 99); 18 Sep 2012 03:27:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Sep 2012 03:27:58 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a57.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Sep 2012 03:27:50 +0000 Received: from homiemail-a57.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a57.g.dreamhost.com (Postfix) with ESMTP id A547C208073 for ; Mon, 17 Sep 2012 20:27:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=ssuM5eqt20IHKZqA7ibaCA7Kyh M=; b=sy+OM5PhDDsyOxf7f237mioHbiTjTUQwNKqfjOyk9MR3UPjuZFXODjZS+g e/hx77lrTtDbq9JdcNxTRzCVVErOOWUk7rqzz3gvyBeGLnks/Lqt4hsHK9E+v00D M2y3TlPCsdkJ/9DLhHJHD7kuER/CZnYYe5s/fZ5FIgUPelM4E= Received: from [192.168.2.77] (unknown [116.90.132.105]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a57.g.dreamhost.com (Postfix) with ESMTPSA id A36F6208070 for ; Mon, 17 Sep 2012 20:27:27 -0700 (PDT) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_6122070A-B774-4DF6-B9BC-53594BCFC8E6" Message-Id: <355AF4BB-E2E2-4C38-BA7C-2E95888F8A1A@thelastpickle.com> Mime-Version: 1.0 (Mac OS X Mail 6.0 \(1486\)) Subject: Re: Cassandra Messages Dropped Date: Tue, 18 Sep 2012 15:27:25 +1200 References: <50288D5C-42ED-40B0-9834-952714E6C50C@yahoo.com> To: user@cassandra.apache.org In-Reply-To: <50288D5C-42ED-40B0-9834-952714E6C50C@yahoo.com> X-Mailer: Apple Mail (2.1486) --Apple-Mail=_6122070A-B774-4DF6-B9BC-53594BCFC8E6 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java = (line 72) MemtablePostFlusher 1 5 0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java = (line 72) FlushWriter 1 5 0 Looks suspiciously like=20 = http://mail-archives.apache.org/mod_mbox/cassandra-user/201209.mbox/%3C9FB= 0E801-B1ED-41C4-9939-BAFBDDF15FD9@thelastpickle.com%3E What version are you on ?=20 Are there any ERROR log messages before this ?=20 Are you seeing MutationStage back up ?=20 Are you see log messages from GCInspector ? Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 18/09/2012, at 2:16 AM, Michael Theroux wrote: > Hello, >=20 > While under load, we have occasionally been seeing "messages dropped" = errors in our cassandra log. Doing some research, I understand this is = part of Cassandra's design to shed load, and we should look at the = tpstats-like output to determine what should be done to resolve the = situation. Typically, you will see lots of messages blocked or pending, = and that might be an indicator that a specific part of hardware needs to = be improved/tuned/upgraded. =20 >=20 > However, looking at the output we are getting, I'm finding it = difficult to see what needs to be tuned, as it looks to me cassandra is = handling the load within the mutation stage: >=20 > NFO [ScheduledTasks:1] 2012-09-17 06:28:03,266 MessagingService.java = (line 658) 119 MUTATION messages dropped in last 5000ms > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,645 StatusLogger.java = (line 57) Pool Name Active Pending Blocked > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,836 StatusLogger.java = (line 72) ReadStage 3 3 0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java = (line 72) RequestResponseStage 0 0 0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java = (line 72) ReadRepairStage 0 0 0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java = (line 72) MutationStage 0 0 0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,838 StatusLogger.java = (line 72) ReplicateOnWriteStage 0 0 0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,838 StatusLogger.java = (line 72) GossipStage 0 0 0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java = (line 72) AntiEntropyStage 0 0 0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java = (line 72) MigrationStage 0 0 0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java = (line 72) StreamStage 0 0 0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java = (line 72) MemtablePostFlusher 1 5 0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java = (line 72) FlushWriter 1 5 0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java = (line 72) MiscStage 0 0 0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java = (line 72) commitlog_archiver 0 0 0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,841 StatusLogger.java = (line 72) InternalResponseStage 0 0 0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,841 StatusLogger.java = (line 72) AntiEntropySessions 0 0 0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,851 StatusLogger.java = (line 72) HintedHandoff 0 0 0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,851 StatusLogger.java = (line 77) CompactionManager 0 0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,852 StatusLogger.java = (line 89) MessagingService n/a 0,0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,852 StatusLogger.java = (line 99) Cache Type Size Capacity = KeysToSave = Provider > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java = (line 100) KeyCache 2184533 2184533 = all = =20 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java = (line 106) RowCache 0 0 = all = org.apache.cassandra.cache.SerializingCacheProvider > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java = (line 113) ColumnFamily Memtable ops,data > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java = (line 116) system.NodeIdInfo 0,0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,854 StatusLogger.java = (line 116) system.IndexInfo 0,0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,854 StatusLogger.java = (line 116) system.LocationInfo 0,0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,854 StatusLogger.java = (line 116) system.Versions 0,0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,855 StatusLogger.java = (line 116) system.schema_keyspaces 0,0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,855 StatusLogger.java = (line 116) system.Migrations 0,0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,855 StatusLogger.java = (line 116) system.schema_columnfamilies 0,0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,856 StatusLogger.java = (line 116) system.schema_columns 0,0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,856 StatusLogger.java = (line 116) system.HintsColumnFamily 0,0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,856 StatusLogger.java = (line 116) system.Schema 0,0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,857 StatusLogger.java = (line 116) prod.comp 0,0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,857 StatusLogger.java = (line 116) prod.byp 224,61863 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,857 StatusLogger.java = (line 116) prod.byn 30,6106 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,857 StatusLogger.java = (line 116) prod.p 3354,3792954 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,858 StatusLogger.java = (line 116) prod.rel 788,153294 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,858 StatusLogger.java = (line 116) prod.i 46,271517 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,859 StatusLogger.java = (line 116) prod.users 18489,5650405 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,859 StatusLogger.java = (line 116) prod.caches 40,7512998 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,859 StatusLogger.java = (line 116) prod.sessions 8085,13063121 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,859 StatusLogger.java = (line 116) prod.content 0,0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,860 StatusLogger.java = (line 116) prod.enr 436,341190 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,860 StatusLogger.java = (line 116) prod.byt 23800,9880339 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,861 StatusLogger.java = (line 116) prod.aliases 63603,10475769 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,861 StatusLogger.java = (line 116) prod.ran 1500,329387 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,861 StatusLogger.java = (line 116) prod.cmap 0,0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,861 StatusLogger.java = (line 116) prod.pmap 73032,61237450 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,862 StatusLogger.java = (line 116) prod.pic 5421,4438066 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,862 StatusLogger.java = (line 116) prod.watchdog 0,0 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,863 StatusLogger.java = (line 116) prod.s 31,40924 > INFO [ScheduledTasks:1] 2012-09-17 06:28:03,863 StatusLogger.java = (line 116) prod.m 21690,64262284 >=20 >=20 > We are using a write consistency level of LOCAL_QUROUM over a six node = cluster. Any observations are welcome, >=20 > Thanks, > -Mike --Apple-Mail=_6122070A-B774-4DF6-B9BC-53594BCFC8E6 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii INFO [ScheduledTasks:1] 2012-09-17 = 06:28:03,839 StatusLogger.java (line 72) MemtablePostFlusher =             &n= bsp; 1         5 =         0
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) = FlushWriter =             &n= bsp;         1 =         5 =         0Looks = suspiciously like 

Wha= t version are you on ? 

Are there any ERROR log = messages before this ? 

Are you seeing = MutationStage back up ? 

Are you see log = messages from GCInspector = ?

Cheers

http://www.thelastpickle.com

On 18/09/2012, at 2:16 AM, Michael Theroux <mtheroux2@yahoo.com> = wrote:

Hello,

While under load, we have occasionally been = seeing "messages dropped" errors in our cassandra log.  Doing some = research, I understand this is part of Cassandra's design to shed load, = and we should look at the tpstats-like output to determine what should = be done to resolve the situation.  Typically, you will see lots of = messages blocked or pending, and that might be an indicator that a = specific part of hardware needs to be improved/tuned/upgraded. =  

However, looking at the output we are getting, I'm finding = it difficult to see what needs to be tuned, as it looks to me cassandra = is handling the load within the mutation stage:

NFO = [ScheduledTasks:1] 2012-09-17 06:28:03,266 MessagingService.java (line = 658) 119 MUTATION messages dropped in last 5000ms
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,645 StatusLogger.java (line 57) = Pool Name =             &n= bsp;      Active   Pending =   Blocked
INFO [ScheduledTasks:1] 2012-09-17 06:28:03,836 = StatusLogger.java (line 72) ReadStage =             &n= bsp;           3 =         3 =         0
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) = RequestResponseStage =             &n= bsp;0         0 =         0
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) = ReadRepairStage =             &n= bsp;     0 =         0 =         0
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,837 StatusLogger.java (line 72) = MutationStage =             &n= bsp;       0 =         0 =         0
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,838 StatusLogger.java (line 72) = ReplicateOnWriteStage =             0 =         0 =         0
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,838 StatusLogger.java (line 72) = GossipStage =             &n= bsp;         0 =         0 =         0
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) = AntiEntropyStage =             &n= bsp;    0 =         0 =         0
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) = MigrationStage =             &n= bsp;      0 =         0 =         0
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) = StreamStage =             &n= bsp;         0 =         0 =         0
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,839 StatusLogger.java (line 72) = MemtablePostFlusher =             &n= bsp; 1         5 =         0
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) = FlushWriter =             &n= bsp;         1 =         5 =         0
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) = MiscStage =             &n= bsp;           0 =         0 =         0
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,840 StatusLogger.java (line 72) = commitlog_archiver =             &n= bsp;  0         0 =         0
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,841 StatusLogger.java (line 72) = InternalResponseStage =             0 =         0 =         0
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,841 StatusLogger.java (line 72) = AntiEntropySessions =             &n= bsp; 0         0 =         0
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,851 StatusLogger.java (line 72) = HintedHandoff =             &n= bsp;       0 =         0 =         0
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,851 StatusLogger.java (line 77) = CompactionManager =             &n= bsp;   0 =         0
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,852 StatusLogger.java (line 89) = MessagingService =             &n= bsp;  n/a       0,0
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,852 StatusLogger.java (line 99) = Cache Type =             &n= bsp;       Size =             &n= bsp;   Capacity =             &n= bsp; KeysToSave =             &n= bsp;           &nbs= p;            =             &n= bsp;      Provider
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 100) = KeyCache =             &n= bsp;      2184533 =             &n= bsp;    2184533 =             &n= bsp;        all =             &n= bsp;           &nbs= p;            =             &n= bsp;           &nbs= p;  
INFO [ScheduledTasks:1] 2012-09-17 06:28:03,853 = StatusLogger.java (line 106) RowCache =             &n= bsp;           &nbs= p;0 =             &n= bsp;          0 =             &n= bsp;        all =             &n= bsp;org.apache.cassandra.cache.SerializingCacheProvider
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,853 StatusLogger.java (line 113) = ColumnFamily =             &n= bsp;  Memtable ops,data
INFO [ScheduledTasks:1] 2012-09-17 = 06:28:03,853 StatusLogger.java (line 116) system.NodeIdInfo =             &n= bsp;           0,0<= br> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,854 StatusLogger.java = (line 116) system.IndexInfo =             &n= bsp;           &nbs= p;0,0
INFO [ScheduledTasks:1] 2012-09-17 06:28:03,854 = StatusLogger.java (line 116) system.LocationInfo =             &n= bsp;         0,0
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,854 StatusLogger.java (line 116) = system.Versions =             &n= bsp;           &nbs= p; 0,0
INFO [ScheduledTasks:1] 2012-09-17 06:28:03,855 = StatusLogger.java (line 116) system.schema_keyspaces =             &n= bsp;     0,0
INFO [ScheduledTasks:1] = 2012-09-17 06:28:03,855 StatusLogger.java (line 116) system.Migrations =             &n= bsp;           0,0<= br> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,855 StatusLogger.java = (line 116) system.schema_columnfamilies =             &n= bsp;   0,0
INFO [ScheduledTasks:1] 2012-09-17 = 06:28:03,856 StatusLogger.java (line 116) system.schema_columns =             &n= bsp;       0,0
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,856 StatusLogger.java (line 116) = system.HintsColumnFamily =             &n= bsp;    0,0
INFO [ScheduledTasks:1] 2012-09-17 = 06:28:03,856 StatusLogger.java (line 116) system.Schema =             &n= bsp;           &nbs= p;   0,0
INFO [ScheduledTasks:1] 2012-09-17 = 06:28:03,857 StatusLogger.java (line 116) prod.comp =             &n= bsp;           &nbs= p; 0,0
INFO [ScheduledTasks:1] 2012-09-17 06:28:03,857 = StatusLogger.java (line 116) prod.byp =             &n= bsp;        224,61863
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,857 StatusLogger.java (line 116) = prod.byn =             &n= bsp;        30,6106
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,857 StatusLogger.java (line 116) = prod.p =             &n= bsp;       3354,3792954
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,858 StatusLogger.java (line 116) = prod.rel =             &n= bsp;  788,153294
INFO [ScheduledTasks:1] 2012-09-17 = 06:28:03,858 StatusLogger.java (line 116) prod.i =             &n= bsp;          46,271517<= br> INFO [ScheduledTasks:1] 2012-09-17 06:28:03,859 StatusLogger.java = (line 116) prod.users =             &n= bsp;       18489,5650405
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,859 StatusLogger.java (line 116) = prod.caches =             &n= bsp;         40,7512998
= INFO [ScheduledTasks:1] 2012-09-17 06:28:03,859 StatusLogger.java (line = 116) prod.sessions =             &n= bsp;    8085,13063121
INFO [ScheduledTasks:1] = 2012-09-17 06:28:03,859 StatusLogger.java (line 116) prod.content =             &n= bsp;           &nbs= p;   0,0
INFO [ScheduledTasks:1] 2012-09-17 = 06:28:03,860 StatusLogger.java (line 116) prod.enr =             &n= bsp;    436,341190
INFO [ScheduledTasks:1] = 2012-09-17 06:28:03,860 StatusLogger.java (line 116) prod.byt =             &n= bsp;   23800,9880339
INFO [ScheduledTasks:1] = 2012-09-17 06:28:03,861 StatusLogger.java (line 116) prod.aliases =             &n= bsp;    63603,10475769
INFO [ScheduledTasks:1] = 2012-09-17 06:28:03,861 StatusLogger.java (line 116) prod.ran =             &n= bsp;      1500,329387
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,861 StatusLogger.java (line 116) = prod.cmap =             &n= bsp;          0,0
= INFO [ScheduledTasks:1] 2012-09-17 06:28:03,861 StatusLogger.java (line = 116) prod.pmap =             &n= bsp;  73032,61237450
INFO [ScheduledTasks:1] 2012-09-17 = 06:28:03,862 StatusLogger.java (line 116) prod.pic =             &n= bsp;     5421,4438066
INFO = [ScheduledTasks:1] 2012-09-17 06:28:03,862 StatusLogger.java (line 116) = prod.watchdog =             &n= bsp;           &nbs= p;  0,0
INFO [ScheduledTasks:1] 2012-09-17 06:28:03,863 = StatusLogger.java (line 116) prod.s =             &n= bsp;           31,4= 0924
INFO [ScheduledTasks:1] 2012-09-17 06:28:03,863 = StatusLogger.java (line 116) prod.m =             &n= bsp;   21690,64262284


We are using a write = consistency level of LOCAL_QUROUM over a six node cluster.  Any = observations are = welcome,

Thanks,
-Mike

= --Apple-Mail=_6122070A-B774-4DF6-B9BC-53594BCFC8E6--