Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 4737 invoked from network); 27 Jul 2010 07:02:20 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 27 Jul 2010 07:02:20 -0000 Received: (qmail 40106 invoked by uid 500); 27 Jul 2010 07:02:18 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 40061 invoked by uid 500); 27 Jul 2010 07:02:17 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 40053 invoked by uid 99); 27 Jul 2010 07:02:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Jul 2010 07:02:16 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of justus.thorvaldsson@svenskaspel.se designates 193.183.158.41 as permitted sender) Received: from [193.183.158.41] (HELO puff.svenskaspel.se) (193.183.158.41) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Jul 2010 07:02:11 +0000 Received: from rsbgchs01.ad.spel.se (Not Verified[193.183.158.46]) by puff.svenskaspel.se with MailMarshal (v6,8,2,9371) (using TLS: SSLv23) id ; Tue, 27 Jul 2010 09:00:06 +0200 Received: from RSBGMBS01.ad.spel.se ([172.23.1.91]) by rsbgchs01.ad.spel.se ([172.23.1.94]) with mapi; Tue, 27 Jul 2010 09:01:48 +0200 From: Thorvaldsson Justus To: "'user@cassandra.apache.org'" Date: Tue, 27 Jul 2010 09:01:47 +0200 Subject: SV: what causes MESSAGE-DESERIALIZER-POOL to spike Thread-Topic: what causes MESSAGE-DESERIALIZER-POOL to spike Thread-Index: AcstWM+6K8X1/hZSTrWCDk1VTmCcPQAAD14g Message-ID: <0ABFAD1D323E3C4DA9A7D542DC99F714AA1C3334F6@RSBGMBS01.ad.spel.se> References: <20100726193222.GB44781@alumni.caltech.edu> In-Reply-To: Accept-Language: en-US, sv-SE Content-Language: sv-SE X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US, sv-SE Content-Type: multipart/alternative; boundary="_000_0ABFAD1D323E3C4DA9A7D542DC99F714AA1C3334F6RSBGMBS01adsp_" MIME-Version: 1.0 --_000_0ABFAD1D323E3C4DA9A7D542DC99F714AA1C3334F6RSBGMBS01adsp_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable AFAIK You could use more nodes and read in parallel from them making your r= ead rate go up. Also don't write and read to the same disk may help some. I= t's not so much about "Cassandra's" read rate but what your hardware can ma= nage. /Justus Fr=E5n: Dathan Pattishall [mailto:dathanvp@gmail.com] Skickat: den 27 juli 2010 08:56 Till: user@cassandra.apache.org =C4mne: Re: what causes MESSAGE-DESERIALIZER-POOL to spike On Mon, Jul 26, 2010 at 8:30 PM, Jonathan Ellis > wrote: MDP is backing up because RRS is full at 4096. This means you're not able to process reads as quickly as the requests are coming in. Make whatever is doing those reads be less aggressive. So, for cassandra to function correctly I need to throttle my reads? What r= equest rate is ideal? 100s reads a second, 1000s? For me I would love to do= 100s of thousands of reads a second. Is Cassandra not suited for this? As to why the reads are slow in the first place, usually this means you are disk i/o bound. Posting your cfstats can help troubleshoot but is no substitute for thinking about your application workload. How should I think about my application workload? I use cassandra as a dist= ributed hash table accessing Cassandra by individual keys (BigO(1)). I rand= omly hit a node through a F5 loadbalancer, using the storage CF definition = as defined in the sample storage-conf.xml. Each key is no more then 30 byte= s, the value is a time stamp. I store a total for 20 million keys and updat= e 1.5 million keys a day. Is there anything else I should really think abou= t? What are the limitations in Cassandra that would effect this workload? On Mon, Jul 26, 2010 at 12:32 PM, Anthony Molinaro > wrote: > It's usually I/O which causes backup in MESSAGE-DESERIALIZER-POOL. You > should check iostat and see what it looks like. It may be that you > need more nodes in order to deal with the read/write rate. You can also > use JMX to get latency values on reads and writes and see if the backup > has a corresponding increase in latency. You may be able to get more > out of your hardware and memory with row caching but that really depends > on your data set. > > -Anthony > > On Mon, Jul 26, 2010 at 12:22:46PM -0700, Dathan Pattishall wrote: >> I have 4 nodes on enterprise type hardware (Lots of Ram 12GB, 16 i7 core= s, >> RAID Disks). >> >> ~# /opt/cassandra/bin/nodetool --host=3Dlocalhost --port=3D8181 tpstats >> Pool Name Active Pending Completed >> STREAM-STAGE 0 0 0 >> RESPONSE-STAGE 0 0 516280 >> ROW-READ-STAGE 8 4096 1164326 >> LB-OPERATIONS 0 0 0 >> *MESSAGE-DESERIALIZER-POOL 1 682008 1818682* >> GMFD 0 0 6467 >> LB-TARGET 0 0 0 >> CONSISTENCY-MANAGER 0 0 661477 >> ROW-MUTATION-STAGE 0 0 998780 >> MESSAGE-STREAMING-POOL 0 0 0 >> LOAD-BALANCER-STAGE 0 0 0 >> FLUSH-SORTER-POOL 0 0 0 >> MEMTABLE-POST-FLUSHER 0 0 4 >> FLUSH-WRITER-POOL 0 0 4 >> AE-SERVICE-STAGE 0 0 0 >> HINTED-HANDOFF-POOL 0 0 3 >> >> EQX root@cass04:~# vmstat -n 1 >> >> procs -----------memory---------- ---swap-- -----io---- --system-- >> -----cpu------ >> r b swpd free buff cache si so bi bo in cs us sy = id >> wa st >> 6 10 7096 121816 16244 10375492 0 0 1 3 0 0 5 = 1 >> 94 0 0 >> 2 10 7096 116484 16248 10381144 0 0 5636 4 21210 9820 2 = 1 >> 79 18 0 >> 1 9 7096 108920 16248 10387592 0 0 6216 0 21439 9878 2 = 1 >> 81 16 0 >> 0 9 7096 129108 16248 10364852 0 0 6024 0 23280 8753 2 = 1 >> 80 17 0 >> 2 9 7096 122460 16248 10370908 0 0 6072 0 20835 9461 2 = 1 >> 83 14 0 >> 2 8 7096 115740 16260 10375752 0 0 5168 292 21049 9511 3 = 1 >> 77 20 0 >> 1 10 7096 108424 16260 10382300 0 0 6244 0 21483 8981 2 = 1 >> 75 22 0 >> 3 8 7096 125028 16260 10364104 0 0 5584 0 21238 9436 2 = 1 >> 81 16 0 >> 3 9 7096 117928 16260 10370064 0 0 5988 0 21505 10225 2= 1 >> 77 19 0 >> 1 8 7096 109544 16260 10376640 0 0 6340 28 20840 8602 2 = 1 >> 80 18 0 >> 0 9 7096 127028 16240 10357652 0 0 5984 0 20853 9158 2 = 1 >> 79 18 0 >> 9 0 7096 121472 16240 10363492 0 0 5716 0 20520 8489 1 = 1 >> 82 16 0 >> 3 9 7096 112668 16240 10369872 0 0 6404 0 21314 9459 2 = 1 >> 84 13 0 >> 1 9 7096 127300 16236 10353440 0 0 5684 0 38914 10068 2= 1 >> 76 21 0 >> >> >> *But the 16 cores are hardly utilized. Which indicates to me there is so= me >> bad thread thrashing, but why? * >> >> >> >> 1 [||||| 8.3%] Task= s: >> 1070 total, 1 running >> 2 [ 0.0%] Load >> average: 8.34 9.05 8.82 >> 3 [ 0.0%] Upti= me: >> 192 days(!), 15:29:52 >> 4 [||||||||||| 17.9%] >> 5 [||||| 5.7%] >> 6 [|| 1.3%] >> 7 [|| 2.6%] >> 8 [| 0.6%] >> 9 [| 0.6%] >> 10 [|| 1.9%] >> 11 [|| 1.9%] >> 12 [|| 1.9%] >> 13 [|| 1.3%] >> 14 [| 0.6%] >> 15 [|| 1.3%] >> 16 [| 0.6%] >> Mem[||||||||||||||||||||||||||||||||||||||||||||1791/12028MB] >> Swp[| 6/1983MB] >> >> PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command >> 30269 root 40 0 14100 2116 900 R 4.0 0.0 0:00.49 htop >> 24878 root 40 0 20.6G 8345M 6883M D 3.0 69.4 1:23.03 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark >> 24879 root 40 0 20.6G 8345M 6883M D 3.0 69.4 1:22.93 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark >> 24874 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:22.73 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark >> 24880 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:22.93 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark >> 24875 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:23.17 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark >> 24658 root 40 0 20.6G 8345M 6883M D 2.0 69.4 1:23.06 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark >> 24877 root 40 0 20.6G 8345M 6883M S 2.0 69.4 1:23.43 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark >> 24873 root 40 0 20.6G 8345M 6883M D 1.0 69.4 1:23.65 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark >> 24876 root 40 0 20.6G 8345M 6883M S 1.0 69.4 1:23.62 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark >> 24942 root 40 0 20.6G 8345M 6883M S 1.0 69.4 0:23.50 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark >> 24943 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:29.53 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark >> 24933 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:22.57 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark >> 24939 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:12.73 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark >> 25280 root 40 0 20.6G 8345M 6883M S 0.0 69.4 0:00.10 >> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConcMark > > -- > ------------------------------------------------------------------------ > Anthony Molinaro > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com --_000_0ABFAD1D323E3C4DA9A7D542DC99F714AA1C3334F6RSBGMBS01adsp_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable

AFAIK You could use more nodes and read in parallel from the= m making your read rate go up. Also don’t write and read to the same di= sk may help some. It’s not so much about “Cassandra’s” rea= d rate but what your hardware can manage.

 

/Justus

 

Fr=E5n: Dathan Pattis= hall [mailto:dathanvp@gmail.com]
Skickat: den 27 juli 2010 08:56
Till: user@cassandra.apache.org
=C4mne: Re: what causes MESSAGE-DESERIALIZER-POOL to spike

 

 

On Mon, Jul 26, 2010 at 8:30 PM, Jonathan Ellis <jbellis@gmail.com> wrote:

MDP is backing up becau= se RRS is full at 4096.  This means you're not
able to process reads as quickly as the requests are coming in.  Make<= br> whatever is doing those reads be less aggressive.


So, for cassandra to function correctly I need to throttle my reads? What request rate is ideal? 100s reads a second, 1000s? For me I would love to d= o 100s of thousands of reads a second. Is Cassandra not suited for this?

 

As to why the reads are slow in the first place, usual= ly this means
you are disk i/o bound.  Posting your cfstats can help troubleshoot but is no substitute for thinking about your application workload.


How should I think about my application workload? I use cassandra as a distributed hash table accessing Cassandra by individual keys (BigO(1)). I randomly hit a node through a F5 loadbalancer, using the storage CF definit= ion as defined in the sample storage-conf.xml. Each key is no more then 30 byte= s, the value is a time stamp. I store a total for 20 million keys and update 1= .5 million keys a day. Is there anything else I should really think about? Wha= t are the limitations in Cassandra that would effect this workload?
 



 


On Mon, Jul 26, 2010 at 12:32 PM, Anthony Molinaro
<anthonym@alumni.caltech.= edu> wrote:
> It's usually I/O which causes backup in MESSAGE-DESERIALIZER-POOL.  You
> should check iostat and see what it looks like.  It may be that y= ou
> need more nodes in order to deal with the read/write rate.   You = can also
> use JMX to get latency values on reads and writes and see if the backu= p
> has a corresponding increase in latency.  You may be able to get = more
> out of your hardware and memory with row caching but that really depen= ds
> on your data set.
>
> -Anthony
>
> On Mon, Jul 26, 2010 at 12:22:46PM -0700, Dathan Pattishall wrote:
>> I have 4 nodes on enterprise type hardware (Lots of Ram 12GB, 16 i= 7 cores,
>> RAID Disks).
>>
>> ~# /opt/cassandra/bin/nodetool --host=3Dlocalhost --port=3D8181 tp= stats
>> Pool Name                    Active   Pending      Completed
>> STREAM-STAGE               &nbs= p;      0         0       &n= bsp;      0
>> RESPONSE-STAGE               &n= bsp;    0         0         51= 6280
>> ROW-READ-STAGE               &n= bsp;    8      4096        1164326<= br> >> LB-OPERATIONS               &nb= sp;     0         0              0
>> *MESSAGE-DESERIALIZER-POOL         1    682008        1818682*
>> GMFD                  = ;            0         0 &n= bsp;         6467
>> LB-TARGET                         0         0              0
>> CONSISTENCY-MANAGER             &nbs= p; 0         0         661477
>> ROW-MUTATION-STAGE              = ;  0         0         998780 >> MESSAGE-STREAMING-POOL            0         0             &nb= sp;0
>> LOAD-BALANCER-STAGE             &nbs= p; 0         0             &nb= sp;0
>> FLUSH-SORTER-POOL                 0         0              0
>> MEMTABLE-POST-FLUSHER             0         0             &nb= sp;4
>> FLUSH-WRITER-POOL                 0         0              4
>> AE-SERVICE-STAGE                  0         0         &n= bsp;    0
>> HINTED-HANDOFF-POOL             &nbs= p; 0         0             &nb= sp;3
>>
>> EQX root@cass04:~# vmstat -n 1
>>
>> procs -----------memory---------- ---swap-- -----io---- --system--=
>> -----cpu------
>>  r  b   swpd   free   buff  cache &n= bsp; si   so    bi    bo   in   cs us sy id >> wa st
>>  6 10   7096 121816  16244 10375492    0    0     1     3    0    = 0  5  1
>> 94  0  0
>>  2 10   7096 116484  16248 10381144    0    0  5636     4 21210 9820  2  1
>> 79 18  0
>>  1  9   7096 108920  16248 10387592   &nb= sp;0    0  6216     0 21439 9878  2  1
>> 81 16  0
>>  0  9   7096 129108  16248 10364852   &nb= sp;0    0  6024     0 23280 8753  2  1
>> 80 17  0
>>  2  9   7096 122460  16248 10370908   &nb= sp;0    0  6072     0 20835 9461  2  1
>> 83 14  0
>>  2  8   7096 115740  16260 10375752   &nb= sp;0    0  5168   292 21049 9511  3  1
>> 77 20  0
>>  1 10   7096 108424  16260 10382300    0    0  6244     0 21483 8981  2  1
>> 75 22  0
>>  3  8   7096 125028  16260 10364104   &nb= sp;0    0  5584     0 21238 9436  2  1
>> 81 16  0
>>  3  9   7096 117928  16260 10370064   &nb= sp;0    0  5988     0 21505 10225  2  1
>> 77 19  0
>>  1  8   7096 109544  16260 10376640   &nb= sp;0    0  6340    28 20840 8602  2  1
>> 80 18  0
>>  0  9   7096 127028  16240 10357652   &nb= sp;0    0  5984     0 20853 9158  2  1
>> 79 18  0
>>  9  0   7096 121472  16240 10363492   &nb= sp;0    0  5716     0 20520 8489  1  1
>> 82 16  0
>>  3  9   7096 112668  16240 10369872   &nb= sp;0    0  6404     0 21314 9459  2  1
>> 84 13  0
>>  1  9   7096 127300  16236 10353440   &nb= sp;0    0  5684     0 38914 10068  2  1
>> 76 21  0
>>
>>
>> *But the 16 cores are hardly utilized. Which indicates to me there= is some
>> bad thread thrashing, but why? *
>>
>>
>>
>>   1  [|||||             &n= bsp;                      = ;           8.3%]     Tasks:
>> 1070 total, 1 running
>>   2  [                                    = ;                0.0%]     = Load
>> average: 8.34 9.05 8.82
>>   3  [                                    = ;                0.0%]     Uptime:
>> 192 days(!), 15:29:52
>>   4  [|||||||||||           &nb= sp;                      = ;      17.9%]
>>   5  [|||||             &n= bsp;                      = ;           5.7%]
>>   6  [||              = ;                      = ;              1.3%]
>>   7  [||              = ;                      = ;              2.6%]
>>   8  [|                                    = ;               0.6%]
>>   9  [|                                    = ;               0.6%]
>>   10 [||               &nb= sp;                      = ;            1.9%]
>>   11 [||               &nb= sp;                      = ;            1.9%]
>>   12 [||               &nb= sp;                      = ;            1.9%]
>>   13 [||               &nb= sp;                      = ;            1.3%]
>>   14 [|               &nbs= p;                      = ;             0.6%]
>>   15 [||               &nb= sp;                      = ;            1.3%]
>>   16 [|               &nbs= p;                      = ;             0.6%]
>>   Mem[||||||||||||||||||||||||||||||||||||||||||||1791/12028M= B]
>>   Swp[|               &nbs= p;                      = ;         6/1983MB]
>>
>>   PID USER     PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
>> 30269 root      40   0 14100  2116  = 900 R  4.0  0.0  0:00.49 htop
>> 24878 root      40   0 20.6G 8345M 6883M D  3.0 69.4  1:23.03
>> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConc= Mark
>> 24879 root      40   0 20.6G 8345M 6883M D  3.0 69.4  1:22.93
>> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConc= Mark
>> 24874 root      40   0 20.6G 8345M 6883M D  2.0 69.4  1:22.73
>> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConc= Mark
>> 24880 root      40   0 20.6G 8345M 6883M D  2.0 69.4  1:22.93
>> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConc= Mark
>> 24875 root      40   0 20.6G 8345M 6883M D  2.0 69.4  1:23.17
>> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConc= Mark
>> 24658 root      40   0 20.6G 8345M 6883M D  2.0 69.4  1:23.06
>> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConc= Mark
>> 24877 root      40   0 20.6G 8345M 6883M S  2.0 69.4  1:23.43
>> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConc= Mark
>> 24873 root      40   0 20.6G 8345M 6883M D  1.0 69.4  1:23.65
>> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConc= Mark
>> 24876 root      40   0 20.6G 8345M 6883M S  1.0 69.4  1:23.62
>> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConc= Mark
>> 24942 root      40   0 20.6G 8345M 6883M S  1.0 69.4  0:23.50
>> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConc= Mark
>> 24943 root      40   0 20.6G 8345M 6883M S  0.0 69.4  0:29.53
>> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConc= Mark
>> 24933 root      40   0 20.6G 8345M 6883M S  0.0 69.4  0:22.57
>> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConc= Mark
>> 24939 root      40   0 20.6G 8345M 6883M S  0.0 69.4  0:12.73
>> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConc= Mark
>> 25280 root      40   0 20.6G 8345M 6883M S  0.0 69.4  0:00.10
>> /opt/java/bin/java -ea -Xms1G -Xmx7G -XX:+UseParNewGC -XX:+UseConc= Mark
>
> --
> ----------------------------------------------------------------------= --
> Anthony Molinaro               &nbs= p;           <anthonym@alumni.caltech.edu= >
>


--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

 

--_000_0ABFAD1D323E3C4DA9A7D542DC99F714AA1C3334F6RSBGMBS01adsp_--