Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 74568 invoked from network); 5 May 2010 23:37:06 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 5 May 2010 23:37:06 -0000 Received: (qmail 80871 invoked by uid 500); 5 May 2010 23:37:05 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 80849 invoked by uid 500); 5 May 2010 23:37:05 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 80838 invoked by uid 99); 5 May 2010 23:37:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 May 2010 23:37:05 +0000 X-ASF-Spam-Status: No, hits=1.6 required=10.0 tests=AWL,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of MJones@imagehawk.com designates 67.63.148.114 as permitted sender) Received: from [67.63.148.114] (HELO ihedge.imagehawk.com) (67.63.148.114) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 May 2010 23:37:00 +0000 Received: from ihcomm.ImageHawk.local (192.168.21.2) by mail.imagehawk.com (192.168.21.3) with Microsoft SMTP Server (TLS) id 8.0.813.0; Wed, 5 May 2010 18:36:52 -0500 Received: from ihcomm.ImageHawk.local ([192.168.20.2]) by ihcomm.ImageHawk.local ([192.168.20.2]) with mapi; Wed, 5 May 2010 18:33:12 -0500 From: Mark Jones To: "user@cassandra.apache.org" Date: Wed, 5 May 2010 18:36:14 -0500 Subject: RE: performance tuning - where does the slowness come from? Thread-Topic: performance tuning - where does the slowness come from? Thread-Index: Acrsndpq7mpAvuOmSty8IzuemYpm2gADVaew Message-ID: References: <8A606DEA-CB57-4D0B-90C0-FE79B2DE22E9@discovereads.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_A683EE2D55D14244B72772B0A53B3A1B012F43A9A8FBihcommImage_" MIME-Version: 1.0 --_000_A683EE2D55D14244B72772B0A53B3A1B012F43A9A8FBihcommImage_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Have you actually managed to get 10K reads/second, or are you just estimati= ng that you can? I've run into similar issues, but I never got reads to sc= ale when searching for unique keys even using 40 threads, I did discover th= at using 80+ threads, I can actually reduce performance. I've never gotten= more than 200-300 reads/second (steady state) off a 4 cluster node. I can= get roughly 8K writes/second to the same cluster (although I haven't teste= d both simultaneously with results worth talking about). From: Ran Tavory [mailto:rantav@gmail.com] Sent: Wednesday, May 05, 2010 4:59 PM To: user@cassandra.apache.org Subject: Re: performance tuning - where does the slowness come from? let's see if I can make some assertions, feel free to correct me... Well, obviously, reads are much slower in cassandra than writes, everyone k= nows that, but by which factor? In my case I read/write only one column at a time. Key, column and value ar= e pretty small (< 200b) So the numbers are usually - Write Latency: ~0.05ms, Read Latency: ~30ms. S= o it looks like reads are 60x slower, at least on my hardware. This happens= when cache is cold. If cache is warm reads are better, but unfortunately m= y cache is usually cold... If my application keeps reading a cold cache there's nothing I can do to ma= ke reads faster from cassandra's side. With one client thread this implies = 1000/30=3D33 reads/sec, not great. However, although read latency is a bottleneck, read throughput isn't. So I= need to add more reading threads and I can actually add many of them befor= e read latency starts draining, according to ycsb I can have ~10000 reads/s= ec (on the cluster they tested, numbers may vary) before read latency start= s draining. So, numbers may vary by cluster size, hardware, data size etc, = but the idea is - if read latency is 30ms and cache is a miss most of the t= ime, that's normal, just add more reader threads and you get a better throu= ghput. Sorry if this sounds trivial, I was just trying to improve on the 30ms read= s until I realized I actually can't... On Wed, May 5, 2010 at 7:08 PM, Jonathan Ellis > wrote: - your key cache isn't warm. capacity 17M, size 0.5M, 468083 reads sounds like most of your reads have been for unique keys. - the kind of reads you are doing can have a big effect (mostly number of columns you are asking for). column index granularity plays a role (for non-rowcached reads); so can column comparator (see e.g. https://issues.apache.org/jira/browse/CASSANDRA-1043) - the slow system reads are all on HH rows, which can get very wide (hence, slow to read the whole row, which is what the HH code does). clean those out either by bringing back the nodes it's hinting for, or just removing the HH data files. On Wed, May 5, 2010 at 10:19 AM, Ran Tavory > wrote: > I'm still trying to figure out where my slowness is coming from... > By now I'm pretty sure it's the reads are slow, but not sure how to impro= ve > them. > I'm looking at cfstats. Can you say if there are better configuration > options? So far I've used all default settings, except for: > > KeysCached=3D"50%"/> > > org.apache.cassandra.locator.RackAwareStrategy= > 2 > > org.apache.cassandra.locator.EndPointSnitch > > > What does a good read latency look like? I was expecting 10ms, however so > far it seems that my KvImpressions read latency is 30ms and in the system > keyspace I have 800ms :( > I thought adding KeysCached=3D"50%" would improve my situation but > unfortunately looks like the hitrate is about 0. I realize that's > application specific, but maybe there are other magic bullets... > Is there something like adding cache to the system keyspace? 800 ms is > pretty bad, isn't it? > See stats below and thanks. > > Keyspace: outbrain_kvdb > Read Count: 651668 > Read Latency: 34.18622328547666 ms. > Write Count: 655542 > Write Latency: 0.041145092152752985 ms. > Pending Tasks: 0 > Column Family: KvImpressions > SSTable count: 13 > Space used (live): 23304548897 > Space used (total): 23304548897 > Memtable Columns Count: 895 > Memtable Data Size: 2108990 > Memtable Switch Count: 8 > Read Count: 468083 > Read Latency: 151.603 ms. > Write Count: 552566 > Write Latency: 0.023 ms. > Pending Tasks: 0 > Key cache capacity: 17398656 > Key cache size: 567967 > Key cache hit rate: 0.0 > Row cache: disabled > Compacted row minimum size: 269 > Compacted row maximum size: 54501 > Compacted row mean size: 933 > ... > ---------------- > Keyspace: system > Read Count: 1151 > Read Latency: 872.5014448305822 ms. > Write Count: 51215 > Write Latency: 0.07156788050375866 ms. > Pending Tasks: 0 > Column Family: HintsColumnFamily > SSTable count: 5 > Space used (live): 437366878 > Space used (total): 437366878 > Memtable Columns Count: 14987 > Memtable Data Size: 87975 > Memtable Switch Count: 2 > Read Count: 1150 > Read Latency: NaN ms. > Write Count: 51211 > Write Latency: 0.027 ms. > Pending Tasks: 0 > Key cache capacity: 6 > Key cache size: 4 > Key cache hit rate: NaN > Row cache: disabled > Compacted row minimum size: 0 > Compacted row maximum size: 0 > Compacted row mean size: 0 > Column Family: LocationInfo > SSTable count: 2 > Space used (live): 3504 > Space used (total): 3504 > Memtable Columns Count: 0 > Memtable Data Size: 0 > Memtable Switch Count: 1 > Read Count: 1 > Read Latency: NaN ms. > Write Count: 7 > Write Latency: NaN ms. > Pending Tasks: 0 > Key cache capacity: 2 > Key cache size: 1 > Key cache hit rate: NaN > Row cache: disabled > Compacted row minimum size: 0 > Compacted row maximum size: 0 > Compacted row mean size: 0 > > On Tue, May 4, 2010 at 10:57 PM, Kyusik Chung > > wrote: >> >> Im using Ubuntu 8.04 on 64 bit hosts on rackspace cloud. >> >> Im in the middle of repeating some perf tests, but so far, I get as-good >> or slightly better read perf by using standard disk access mode vs mmap.= So >> far consecutive tests are returning consistent numbers. >> >> Im not sure how to explain it...maybe its an ubuntu 8.04 issue with mmap= . >> Back when I was using mmap, I was definitely seeing the kswapd0 process >> start using cpu as the box ran out of memory, and read performance >> significantly degraded. >> >> Next, Ill run some tests with mmap_index_only, and Ill test with heavy >> concurrent writes as well as reads. Ill let everyone know what I find. >> >> Kyusik Chung >> CEO, Discovereads.com >> kyusik@discovereads.com >> >> On May 4, 2010, at 12:27 PM, Jonathan Ellis wrote: >> >> > Are you using 32 bit hosts? If not don't be scared of mmap using a >> > lot of address space, you have plenty. It won't make you swap more >> > than using buffered i/o. >> > >> > On Tue, May 4, 2010 at 1:57 PM, Ran Tavory > wrote: >> >> I canceled mmap and indeed memory usage is sane again. So far >> >> performance >> >> hasn't been great, but I'll wait and see. >> >> I'm also interested in a way to cap mmap so I can take advantage of i= t >> >> but >> >> not swap the host to death... >> >> >> >> On Tue, May 4, 2010 at 9:38 PM, Kyusik Chung > >> >> wrote: >> >>> >> >>> This sounds just like the slowness I was asking about in another >> >>> thread - >> >>> after a lot of reads, the machine uses up all available memory on th= e >> >>> box >> >>> and then starts swapping. >> >>> My understanding was that mmap helps greatly with read and write per= f >> >>> (until the box starts swapping I guess)...is there any way to use mm= ap >> >>> and >> >>> cap how much memory it takes up? >> >>> What do people use in production? mmap or no mmap? >> >>> Thanks! >> >>> Kyusik Chung >> >>> On May 4, 2010, at 10:11 AM, Schubert Zhang wrote: >> >>> >> >>> 1. When initially startup your nodes, please plan your InitialToken = of >> >>> each node evenly. >> >>> 2. standard >> >>> >> >>> On Tue, May 4, 2010 at 9:09 PM, Boris Shulman > >> >>> wrote: >> >>>> >> >>>> I think that the extra (more than 4GB) memory usage comes from the >> >>>> mmaped io, that is why it happens only for reads. >> >>>> >> >>>> On Tue, May 4, 2010 at 2:02 PM, Jordan Pittier >> >>>> > >> >>>> wrote: >> >>>>> I'm facing the same issue with swap. It only occurs when I perform >> >>>>> read >> >>>>> operations (write are very fast :)). So I can't help you with the >> >>>>> memory >> >>>>> probleme. >> >>>>> >> >>>>> But to balance the load evenly between nodes in cluster just >> >>>>> manually >> >>>>> fix >> >>>>> their token.(the "formula" is i * 2^127 / nb_nodes). >> >>>>> >> >>>>> Jordzn >> >>>>> >> >>>>> On Tue, May 4, 2010 at 8:20 AM, Ran Tavory > wrote: >> >>>>>> >> >>>>>> I'm looking into performance issues on a 0.6.1 cluster. I see two >> >>>>>> symptoms: >> >>>>>> 1. Reads and writes are slow >> >>>>>> 2. One of the hosts is doing a lot of GC. >> >>>>>> 1 is slow in the sense that in normal state the cluster used to >> >>>>>> make >> >>>>>> around 3-5k read and writes per second (6-10k operations per >> >>>>>> second), >> >>>>>> but >> >>>>>> how it's in the order of 200-400 ops per second, sometimes even >> >>>>>> less. >> >>>>>> 2 looks like this: >> >>>>>> $ tail -f /outbrain/cassandra/log/system.log >> >>>>>> INFO [GC inspection] 2010-05-04 00:42:18,636 GCInspector.java >> >>>>>> (line >> >>>>>> 110) >> >>>>>> GC for ParNew: 672 ms, 166482384 reclaimed leaving 2872087208 use= d; >> >>>>>> max is >> >>>>>> 4432068608 >> >>>>>> INFO [GC inspection] 2010-05-04 00:42:28,638 GCInspector.java >> >>>>>> (line >> >>>>>> 110) >> >>>>>> GC for ParNew: 498 ms, 166493352 reclaimed leaving 2836049448 use= d; >> >>>>>> max is >> >>>>>> 4432068608 >> >>>>>> INFO [GC inspection] 2010-05-04 00:42:38,640 GCInspector.java >> >>>>>> (line >> >>>>>> 110) >> >>>>>> GC for ParNew: 327 ms, 166091528 reclaimed leaving 2796888424 use= d; >> >>>>>> max is >> >>>>>> 4432068608 >> >>>>>> ... and it goes on and on for hours, no stopping... >> >>>>>> The cluster is made of 6 hosts, 3 in one DC and 3 in another. >> >>>>>> Each host has 8G RAM. >> >>>>>> -Xmx=3D4G >> >>>>>> For some reason, the load isn't distributed evenly b/w the hosts, >> >>>>>> although >> >>>>>> I'm not sure this is the cause for slowness >> >>>>>> $ nodetool -h localhost -p 9004 ring >> >>>>>> Address Status Load Range >> >>>>>> Ring >> >>>>>> >> >>>>>> 144413773383729447702215082383444206680 >> >>>>>> 192.168.252.99Up 15.94 GB >> >>>>>> 66002764663998929243644931915471302076 |<--| >> >>>>>> 192.168.254.57Up 19.84 GB >> >>>>>> 81288739225600737067856268063987022738 | ^ >> >>>>>> 192.168.254.58Up 973.78 MB >> >>>>>> 86999744104066390588161689990810839743 v | >> >>>>>> 192.168.252.62Up 5.18 GB >> >>>>>> 88308919879653155454332084719458267849 | ^ >> >>>>>> 192.168.254.59Up 10.57 GB >> >>>>>> 142482163220375328195837946953175033937 v | >> >>>>>> 192.168.252.61Up 11.36 GB >> >>>>>> 144413773383729447702215082383444206680 |-->| >> >>>>>> The slow host is 192.168.252.61 and it isn't the most loaded one. >> >>>>>> The host is waiting a lot on IO and the load average is usually 6= -7 >> >>>>>> $ w >> >>>>>> 00:42:56 up 11 days, 13:22, 1 user, load average: 6.21, 5.52, >> >>>>>> 3.93 >> >>>>>> $ vmstat 5 >> >>>>>> procs -----------memory---------- ---swap-- -----io---- --system-= - >> >>>>>> -----cpu------ >> >>>>>> r b swpd free buff cache si so bi bo in cs >> >>>>>> us >> >>>>>> sy id >> >>>>>> wa st >> >>>>>> 0 8 2147844 45744 1816 4457384 6 5 66 32 5 = 2 >> >>>>>> 1 >> >>>>>> 1 >> >>>>>> 96 2 0 >> >>>>>> 0 8 2147164 49020 1808 4451596 385 0 2345 58 3372 99= 57 >> >>>>>> 2 >> >>>>>> 2 >> >>>>>> 78 18 0 >> >>>>>> 0 3 2146432 45704 1812 4453956 342 0 2274 108 3937 >> >>>>>> 10732 >> >>>>>> 2 2 >> >>>>>> 78 19 0 >> >>>>>> 0 1 2146252 44696 1804 4453436 345 164 1939 294 3647 78= 33 >> >>>>>> 2 >> >>>>>> 2 >> >>>>>> 78 18 0 >> >>>>>> 0 1 2145960 46924 1744 4451260 158 0 2423 122 4354 >> >>>>>> 14597 >> >>>>>> 2 2 >> >>>>>> 77 18 0 >> >>>>>> 7 1 2138344 44676 952 4504148 1722 403 1722 406 1388 4= 39 >> >>>>>> 87 >> >>>>>> 0 >> >>>>>> 10 2 0 >> >>>>>> 7 2 2137248 45652 956 4499436 1384 655 1384 658 1356 3= 92 >> >>>>>> 87 >> >>>>>> 0 >> >>>>>> 10 3 0 >> >>>>>> 7 1 2135976 46764 956 4495020 1366 718 1366 718 1395 3= 80 >> >>>>>> 87 >> >>>>>> 0 >> >>>>>> 9 4 0 >> >>>>>> 0 8 2134484 46964 956 4489420 1673 555 1814 586 1601 >> >>>>>> 215590 >> >>>>>> 14 >> >>>>>> 2 68 16 0 >> >>>>>> 0 1 2135388 47444 972 4488516 785 833 2390 995 3812 83= 05 >> >>>>>> 2 >> >>>>>> 2 >> >>>>>> 77 20 0 >> >>>>>> 0 10 2135164 45928 980 4488796 788 543 2275 626 36 >> >>>>>> So, the host is swapping like crazy... >> >>>>>> top shows that it's using a lot of memory. As noted before -Xmx= =3D4G >> >>>>>> and >> >>>>>> nothing else seems to be using a lot of memory on the host except >> >>>>>> for >> >>>>>> the >> >>>>>> cassandra process, however, of the 8G ram on the host, 92% is use= d >> >>>>>> by >> >>>>>> cassandra. How's that? >> >>>>>> Top shows there's 3.9g Shared and 7.2g Resident and 15.9g Virtual= . >> >>>>>> Why >> >>>>>> does it have 15g virtual? And why 7.2 RES? This can explain the >> >>>>>> slowness in >> >>>>>> swapping. >> >>>>>> $ top >> >>>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ >> >>>>>> COMMAND >> >>>>>> >> >>>>>> >> >>>>>> 20281 cassandr 25 0 15.9g 7.2g 3.9g S 33.3 92.6 175:30.27 java >> >>>>>> So, can the total memory be controlled? >> >>>>>> Or perhaps I'm looking in the wrong direction... >> >>>>>> I've looked at all the cassandra JMX counts and nothing seemed >> >>>>>> suspicious >> >>>>>> so far. By suspicious i mean a large number of pending tasks - >> >>>>>> there >> >>>>>> were >> >>>>>> always very small numbers in each pool. >> >>>>>> About read and write latencies, I'm not sure what the normal stat= e >> >>>>>> is, >> >>>>>> but >> >>>>>> here's an example of what I see on the problematic host: >> >>>>>> #mbean =3D org.apache.cassandra.service:type=3DStorageProxy: >> >>>>>> RecentReadLatencyMicros =3D 30105.888180684495; >> >>>>>> TotalReadLatencyMicros =3D 78543052801; >> >>>>>> TotalWriteLatencyMicros =3D 4213118609; >> >>>>>> RecentWriteLatencyMicros =3D 1444.4809201925639; >> >>>>>> ReadOperations =3D 4779553; >> >>>>>> RangeOperations =3D 0; >> >>>>>> TotalRangeLatencyMicros =3D 0; >> >>>>>> RecentRangeLatencyMicros =3D NaN; >> >>>>>> WriteOperations =3D 4740093; >> >>>>>> And the only pool that I do see some pending tasks is the >> >>>>>> ROW-READ-STAGE, >> >>>>>> but it doesn't look like much, usually around 6-8: >> >>>>>> #mbean =3D org.apache.cassandra.concurrent:type=3DROW-READ-STAGE: >> >>>>>> ActiveCount =3D 8; >> >>>>>> PendingTasks =3D 8; >> >>>>>> CompletedTasks =3D 5427955; >> >>>>>> Any help finding the solution is appreciated, thanks... >> >>>>>> Below are a few more JMXes I collected from the system that may b= e >> >>>>>> interesting. >> >>>>>> #mbean =3D java.lang:type=3DMemory: >> >>>>>> Verbose =3D false; >> >>>>>> HeapMemoryUsage =3D { >> >>>>>> committed =3D 3767279616; >> >>>>>> init =3D 134217728; >> >>>>>> max =3D 4293656576; >> >>>>>> used =3D 1237105080; >> >>>>>> }; >> >>>>>> NonHeapMemoryUsage =3D { >> >>>>>> committed =3D 35061760; >> >>>>>> init =3D 24313856; >> >>>>>> max =3D 138412032; >> >>>>>> used =3D 23151320; >> >>>>>> }; >> >>>>>> ObjectPendingFinalizationCount =3D 0; >> >>>>>> #mbean =3D java.lang:name=3DParNew,type=3DGarbageCollector: >> >>>>>> LastGcInfo =3D { >> >>>>>> GcThreadCount =3D 11; >> >>>>>> duration =3D 136; >> >>>>>> endTime =3D 42219272; >> >>>>>> id =3D 11719; >> >>>>>> memoryUsageAfterGc =3D { >> >>>>>> ( CMS Perm Gen ) =3D { >> >>>>>> key =3D CMS Perm Gen; >> >>>>>> value =3D { >> >>>>>> committed =3D 29229056; >> >>>>>> init =3D 21757952; >> >>>>>> max =3D 88080384; >> >>>>>> used =3D 17648848; >> >>>>>> }; >> >>>>>> }; >> >>>>>> ( Code Cache ) =3D { >> >>>>>> key =3D Code Cache; >> >>>>>> value =3D { >> >>>>>> committed =3D 5832704; >> >>>>>> init =3D 2555904; >> >>>>>> max =3D 50331648; >> >>>>>> used =3D 5563520; >> >>>>>> }; >> >>>>>> }; >> >>>>>> ( CMS Old Gen ) =3D { >> >>>>>> key =3D CMS Old Gen; >> >>>>>> value =3D { >> >>>>>> committed =3D 3594133504; >> >>>>>> init =3D 112459776; >> >>>>>> max =3D 4120510464; >> >>>>>> used =3D 964565720; >> >>>>>> }; >> >>>>>> }; >> >>>>>> ( Par Eden Space ) =3D { >> >>>>>> key =3D Par Eden Space; >> >>>>>> value =3D { >> >>>>>> committed =3D 171835392; >> >>>>>> init =3D 21495808; >> >>>>>> max =3D 171835392; >> >>>>>> used =3D 0; >> >>>>>> }; >> >>>>>> }; >> >>>>>> ( Par Survivor Space ) =3D { >> >>>>>> key =3D Par Survivor Space; >> >>>>>> value =3D { >> >>>>>> committed =3D 1310720; >> >>>>>> init =3D 131072; >> >>>>>> max =3D 1310720; >> >>>>>> used =3D 0; >> >>>>>> }; >> >>>>>> }; >> >>>>>> }; >> >>>>>> memoryUsageBeforeGc =3D { >> >>>>>> ( CMS Perm Gen ) =3D { >> >>>>>> key =3D CMS Perm Gen; >> >>>>>> value =3D { >> >>>>>> committed =3D 29229056; >> >>>>>> init =3D 21757952; >> >>>>>> max =3D 88080384; >> >>>>>> used =3D 17648848; >> >>>>>> }; >> >>>>>> }; >> >>>>>> ( Code Cache ) =3D { >> >>>>>> key =3D Code Cache; >> >>>>>> value =3D { >> >>>>>> committed =3D 5832704; >> >>>>>> init =3D 2555904; >> >>>>>> max =3D 50331648; >> >>>>>> used =3D 5563520; >> >>>>>> }; >> >>>>>> }; >> >>>>>> ( CMS Old Gen ) =3D { >> >>>>>> key =3D CMS Old Gen; >> >>>>>> value =3D { >> >>>>>> committed =3D 3594133504; >> >>>>>> init =3D 112459776; >> >>>>>> max =3D 4120510464; >> >>>>>> used =3D 959221872; >> >>>>>> }; >> >>>>>> }; >> >>>>>> ( Par Eden Space ) =3D { >> >>>>>> key =3D Par Eden Space; >> >>>>>> value =3D { >> >>>>>> committed =3D 171835392; >> >>>>>> init =3D 21495808; >> >>>>>> max =3D 171835392; >> >>>>>> used =3D 171835392; >> >>>>>> }; >> >>>>>> }; >> >>>>>> ( Par Survivor Space ) =3D { >> >>>>>> key =3D Par Survivor Space; >> >>>>>> value =3D { >> >>>>>> committed =3D 1310720; >> >>>>>> init =3D 131072; >> >>>>>> max =3D 1310720; >> >>>>>> used =3D 0; >> >>>>>> }; >> >>>>>> }; >> >>>>>> }; >> >>>>>> startTime =3D 42219136; >> >>>>>> }; >> >>>>>> CollectionCount =3D 11720; >> >>>>>> CollectionTime =3D 4561730; >> >>>>>> Name =3D ParNew; >> >>>>>> Valid =3D true; >> >>>>>> MemoryPoolNames =3D [ Par Eden Space, Par Survivor Space ]; >> >>>>>> #mbean =3D java.lang:type=3DOperatingSystem: >> >>>>>> MaxFileDescriptorCount =3D 63536; >> >>>>>> OpenFileDescriptorCount =3D 75; >> >>>>>> CommittedVirtualMemorySize =3D 17787711488; >> >>>>>> FreePhysicalMemorySize =3D 45522944; >> >>>>>> FreeSwapSpaceSize =3D 2123968512; >> >>>>>> ProcessCpuTime =3D 12251460000000; >> >>>>>> TotalPhysicalMemorySize =3D 8364417024; >> >>>>>> TotalSwapSpaceSize =3D 4294959104; >> >>>>>> Name =3D Linux; >> >>>>>> AvailableProcessors =3D 8; >> >>>>>> Arch =3D amd64; >> >>>>>> SystemLoadAverage =3D 4.36; >> >>>>>> Version =3D 2.6.18-164.15.1.el5; >> >>>>>> #mbean =3D java.lang:type=3DRuntime: >> >>>>>> Name =3D 20281@ob1061.nydc1.outbrain.com; >> >>>>>> >> >>>>>> ClassPath =3D >> >>>>>> >> >>>>>> >> >>>>>> /outbrain/cassandra/apache-cassandra-0.6.1/bin/../conf:/outbrain/= cassandra/apache-cassandra-0.6.1/bin/../build/classes:/outbrain/cassandra/a= pache-cassandra-0.6.1/bin/.. >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> /lib/antlr-3.1.3.jar:/outbrain/cassandra/apache-cassandra-0.6.1/b= in/../lib/apache-cassandra-0.6.1.jar:/outbrain/cassandra/apache-cassandra-0= .6.1/bin/../lib/avro-1.2.0-dev.jar:/outb >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> rain/cassandra/apache-cassandra-0.6.1/bin/../lib/clhm-production.= jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/commons-cli-1.1.j= ar:/outbrain/cassandra/apache-cassandra- >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> 0.6.1/bin/../lib/commons-codec-1.2.jar:/outbrain/cassandra/apache= -cassandra-0.6.1/bin/../lib/commons-collections-3.2.1.jar:/outbrain/cassand= ra/apache-cassandra-0.6.1/bin/../lib/com >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> mons-lang-2.4.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/= ../lib/google-collections-1.0.jar:/outbrain/cassandra/apache-cassandra-0.6.= 1/bin/../lib/hadoop-core-0.20.1.jar:/out >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> brain/cassandra/apache-cassandra-0.6.1/bin/../lib/high-scale-lib.= jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/ivy-2.1.0.jar:/ou= tbrain/cassandra/apache-cassandra-0.6.1/ >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> bin/../lib/jackson-core-asl-1.4.0.jar:/outbrain/cassandra/apache-= cassandra-0.6.1/bin/../lib/jackson-mapper-asl-1.4.0.jar:/outbrain/cassandra= /apache-cassandra-0.6.1/bin/../lib/jline >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> -0.9.94.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib= /json-simple-1.1.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/= libthrift-r917130.jar:/outbrain/cassandr >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> a/apache-cassandra-0.6.1/bin/../lib/log4j-1.2.14.jar:/outbrain/ca= ssandra/apache-cassandra-0.6.1/bin/../lib/slf4j-api-1.5.8.jar:/outbrain/cas= sandra/apache-cassandra-0.6.1/bin/../lib >> >>>>>> /slf4j-log4j12-1.5.8.jar; >> >>>>>> >> >>>>>> BootClassPath =3D >> >>>>>> >> >>>>>> >> >>>>>> /usr/java/jdk1.6.0_17/jre/lib/alt-rt.jar:/usr/java/jdk1.6.0_17/jr= e/lib/resources.jar:/usr/java/jdk1.6.0_17/jre/lib/rt.jar:/usr/java/jdk1.6.0= _17/jre/lib/sunrsasign.j >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> ar:/usr/java/jdk1.6.0_17/jre/lib/jsse.jar:/usr/java/jdk1.6.0_17/j= re/lib/jce.jar:/usr/java/jdk1.6.0_17/jre/lib/charsets.jar:/usr/java/jdk1.6.= 0_17/jre/classes; >> >>>>>> >> >>>>>> LibraryPath =3D >> >>>>>> >> >>>>>> >> >>>>>> /usr/java/jdk1.6.0_17/jre/lib/amd64/server:/usr/java/jdk1.6.0_17/= jre/lib/amd64:/usr/java/jdk1.6.0_17/jre/../lib/amd64:/usr/java/packages/lib= /amd64:/lib:/usr/lib; >> >>>>>> >> >>>>>> VmName =3D Java HotSpot(TM) 64-Bit Server VM; >> >>>>>> >> >>>>>> VmVendor =3D Sun Microsystems Inc.; >> >>>>>> >> >>>>>> VmVersion =3D 14.3-b01; >> >>>>>> >> >>>>>> BootClassPathSupported =3D true; >> >>>>>> >> >>>>>> InputArguments =3D [ -ea, -Xms128M, -Xmx4G, >> >>>>>> -XX:TargetSurvivorRatio=3D90, >> >>>>>> -XX:+AggressiveOpts, -XX:+UseParNewGC, -XX:+UseConcMarkSweepGC, >> >>>>>> -XX:+CMSParallelRemarkEnabled, -XX:+HeapDumpOnOutOfMemoryError, >> >>>>>> -XX:SurvivorRatio=3D128, -XX:MaxTenuringThreshold=3D0, >> >>>>>> -Dcom.sun.management.jmxremote.port=3D9004, >> >>>>>> -Dcom.sun.management.jmxremote.ssl=3Dfalse, >> >>>>>> -Dcom.sun.management.jmxremote.authenticate=3Dfalse, >> >>>>>> >> >>>>>> >> >>>>>> -Dstorage-config=3D/outbrain/cassandra/apache-cassandra-0.6.1/bin= /../conf, >> >>>>>> -Dcassandra-pidfile=3D/var/run/cassandra.pid ]; >> >>>>>> >> >>>>>> ManagementSpecVersion =3D 1.2; >> >>>>>> >> >>>>>> SpecName =3D Java Virtual Machine Specification; >> >>>>>> >> >>>>>> SpecVendor =3D Sun Microsystems Inc.; >> >>>>>> >> >>>>>> SpecVersion =3D 1.0; >> >>>>>> >> >>>>>> StartTime =3D 1272911001415; >> >>>>>> ... >> >>>>> >> >>> >> >>> >> >> >> >> >> > >> > >> > >> > -- >> > Jonathan Ellis >> > Project Chair, Apache Cassandra >> > co-founder of Riptano, the source for professional Cassandra support >> > http://riptano.com >> > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com --_000_A683EE2D55D14244B72772B0A53B3A1B012F43A9A8FBihcommImage_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Have you actually managed to get 10K reads/second, or are yo= u just estimating that you can?  I’ve run into similar issues, but= I never got reads to scale when searching for unique keys even using 40 threads, I did = discover that using 80+ threads, I can actually reduce performance.  I’ve= never gotten more than 200-300 reads/second (steady state) off a 4 cluster node.  I= can get roughly 8K writes/second to the same cluster (although I haven’t test= ed both simultaneously with results worth talking about).

 

From: Ran Tavory [mailto:rantav@gmail.com]
Sent: Wednesday, May 05, 2010 4:59 PM
To: user@cassandra.apache.org
Subject: Re: performance tuning - where does the slowness come from?=

 

let's see if I can make some assertions, feel free to correct me...

 

Well, obviously, reads are much slower in cassandra th= an writes, everyone knows that, but by which factor?

In my case I read/write only one column at a time. Key= , column and value are pretty small (< 200b)

So the numbers are usually - Write Latency: ~0.05ms, R= ead Latency: ~30ms. So it looks like reads are 60x slower, at least on my hardw= are. This happens when cache is cold. If cache is warm reads are better, but unfortunately my cache is usually cold... 

If my application keeps reading a cold cache there's n= othing I can do to make reads faster from cassandra's side. With one client thread this implies 1000/30=3D33 reads/sec, not great.

However, although read latency is a bottleneck, read throughput isn't. So I need to add more reading threads and I can actually = add many of them before read latency starts draining, according to ycsb I can h= ave ~10000 reads/sec (on the cluster they tested, numbers may vary) before read latency starts draining. So, numbers may vary by cluster size, hardware, da= ta size etc, but the idea is - if read latency is 30ms and cache is a miss mos= t of the time, that's normal, just add more reader threads and you get a better throughput.

Sorry if this sounds trivial, I was just trying to imp= rove on the 30ms reads until I realized I actually can't... 

 

On Wed, May 5, 2010 at 7:08 PM, Jonathan Ellis <jbellis@gmail.com> wrote:

 - your key cache isn't warm.  capacity 17M,= size 0.5M, 468083 reads
sounds like most of your reads have been for unique keys.
 - the kind of reads you are doing can have a big effect (mostly
number of columns you are asking for).  column index granularity plays=
a role (for non-rowcached reads); so can column comparator (see e.g.
https://issues.apache.org/jira/browse/CASSANDRA-1043)
 - the slow system reads are all on HH rows, which can get very wide (hence, slow to read the whole row, which is what the HH code does).
clean those out either by bringing back the nodes it's hinting for, or
just removing the HH data files.


On Wed, May 5, 2010 at 10:19 AM, Ran Tavory <rantav@gmail.com> wrote:
> I'm still trying to figure out where my slowness is coming from...
> By now I'm pretty sure it's the reads are slow, but not sure how to improve
> them.
> I'm looking at cfstats. Can you say if there are better configuration<= br> > options? So far I've used all default settings, except for:
>     <Keyspace Name=3D"outbrain_kvdb"> >       <ColumnFamily CompareWith=3D"BytesType" Name=3D"KvImpressions"
> KeysCached=3D"50%"/>
>
>  <ReplicaPlacementStrategy>org.apache.cassandra.locator.RackAwar= eStrategy</ReplicaPlacementStrategy>
>       <ReplicationFactor>2</ReplicationFactor>
>
>  <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch<= /EndPointSnitch>
>     </Keyspace>
>
> What does a good read latency look like? I was expecting 10ms, however= so
> far it seems that my KvImpressions read latency is 30ms and in the sys= tem
> keyspace I have 800ms :(
> I thought adding KeysCached=3D"50%" would improve my situation but
> unfortunately looks like the hitrate is about 0. I realize that's
> application specific, but maybe there are other magic bullets...
> Is there something like adding cache to the system keyspace? 800 ms is=
> pretty bad, isn't it?
> See stats below and thanks.
>
> Keyspace: outbrain_kvdb
>         Read Count: 651668
>         Read Latency: 34.18622328547666 ms. >         Write Count: 655542
>         Write Latency: 0.041145092152752985 m= s.
>         Pending Tasks: 0
>                 Column Family: KvImpressions
>                 SSTable count: 13
>                 Space use= d (live): 23304548897
>                 Space use= d (total): 23304548897
>                 Memtable Columns Count: 895
>                 Memtable = Data Size: 2108990
>                 Memtable = Switch Count: 8
>                 Read Coun= t: 468083
>                 Read Late= ncy: 151.603 ms.
>                 Write Cou= nt: 552566
>                 Write Latency: 0.023 ms.
>                 Pending Tasks: 0
>                 Key cache capacity: 17398656
>                 Key cache size: 567967
>                 Key cache= hit rate: 0.0
>                 Row cache= : disabled
>                 Compacted= row minimum size: 269
>                 Compacted= row maximum size: 54501
>                 Compacted= row mean size: 933
> ...
> ----------------
> Keyspace: system
>         Read Count: 1151
>         Read Latency: 872.5014448305822 ms. >         Write Count: 51215
>         Write Latency: 0.07156788050375866 ms= .
>         Pending Tasks: 0
>                 Column Family: HintsColumnFamily
>                 SSTable count: 5
>                 Space use= d (live): 437366878
>                 Space use= d (total): 437366878
>                 Memtable Columns Count: 14987
>                 Memtable = Data Size: 87975
>                 Memtable Switch Count: 2
>                 Read Coun= t: 1150
>                 Read Late= ncy: NaN ms.
>                 Write Cou= nt: 51211
>                 Write Latency: 0.027 ms.
>                 Pending Tasks: 0
>                 Key cache capacity: 6
>                 Key cache size: 4
>                 Key cache= hit rate: NaN
>                 Row cache= : disabled
>                 Compacted= row minimum size: 0
>                 Compacted= row maximum size: 0
>                 Compacted= row mean size: 0
>                 Column Family: LocationInfo
>                 SSTable count: 2
>                 Space use= d (live): 3504
>                 Space use= d (total): 3504
>                 Memtable Columns Count: 0
>                 Memtable = Data Size: 0
>                 Memtable Switch Count: 1
>                 Read Coun= t: 1
>                 Read Late= ncy: NaN ms.
>                 Write Cou= nt: 7
>                 Write Latency: NaN ms.
>                 Pending Tasks: 0
>                 Key cache capacity: 2
>                 Key cache size: 1
>                 Key cache= hit rate: NaN
>                 Row cache= : disabled
>                 Compacted= row minimum size: 0
>                 Compacted= row maximum size: 0
>                 Compacted= row mean size: 0
>
> On Tue, May 4, 2010 at 10:57 PM, Kyusik Chung <kyusik@discovereads.com>
> wrote:
>>
>> Im using Ubuntu 8.04 on 64 bit hosts on rackspace cloud.
>>
>> Im in the middle of repeating some perf tests, but so far, I get as-good
>> or slightly better read perf by using standard disk access mode vs mmap.  So
>> far consecutive tests are returning consistent numbers.
>>
>> Im not sure how to explain it...maybe its an ubuntu 8.04 issue wit= h mmap.
>>  Back when I was using mmap, I was definitely seeing the kswa= pd0 process
>> start using cpu as the box ran out of memory, and read performance=
>> significantly degraded.
>>
>> Next, Ill run some tests with mmap_index_only, and Ill test with h= eavy
>> concurrent writes as well as reads.  Ill let everyone know wh= at I find.
>>
>> Kyusik Chung
>> CEO, Discovereads.com
>> kyusik@discovereads.com=
>>
>> On May 4, 2010, at 12:27 PM, Jonathan Ellis wrote:
>>
>> > Are you using 32 bit hosts?  If not don't be scared of m= map using a
>> > lot of address space, you have plenty.  It won't make yo= u swap more
>> > than using buffered i/o.
>> >
>> > On Tue, May 4, 2010 at 1:57 PM, Ran Tavory <rantav@gmail.com> wrote:
>> >> I canceled mmap and indeed memory usage is sane again. So= far
>> >> performance
>> >> hasn't been great, but I'll wait and see.
>> >> I'm also interested in a way to cap mmap so I can take advantage of it
>> >> but
>> >> not swap the host to death...
>> >>
>> >> On Tue, May 4, 2010 at 9:38 PM, Kyusik Chung <kyusik@discovereads.com>
>> >> wrote:
>> >>>
>> >>> This sounds just like the slowness I was asking about= in another
>> >>> thread -
>> >>> after a lot of reads, the machine uses up all availab= le memory on the
>> >>> box
>> >>> and then starts swapping.
>> >>> My understanding was that mmap helps greatly with rea= d and write perf
>> >>> (until the box starts swapping I guess)...is there an= y way to use mmap
>> >>> and
>> >>> cap how much memory it takes up?
>> >>> What do people use in production?  mmap or no mm= ap?
>> >>> Thanks!
>> >>> Kyusik Chung
>> >>> On May 4, 2010, at 10:11 AM, Schubert Zhang wrote: >> >>>
>> >>> 1. When initially startup your nodes, please plan you= r InitialToken of
>> >>> each node evenly.
>> >>> 2. <DiskAccessMode>standard</DiskAccessMode&= gt;
>> >>>
>> >>> On Tue, May 4, 2010 at 9:09 PM, Boris Shulman <shulmanb@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> I think that the extra (more than 4GB) memory usa= ge comes from the
>> >>>> mmaped io, that is why it happens only for reads.=
>> >>>>
>> >>>> On Tue, May 4, 2010 at 2:02 PM, Jordan Pittier >> >>>> <j= ordan.pittier@gmail.com>
>> >>>> wrote:
>> >>>>> I'm facing the same issue with swap. It only occurs when I perform
>> >>>>> read
>> >>>>> operations (write are very fast :)). So I can= 't help you with the
>> >>>>> memory
>> >>>>> probleme.
>> >>>>>
>> >>>>> But to balance the load evenly between nodes = in cluster just
>> >>>>> manually
>> >>>>> fix
>> >>>>> their token.(the "formula" is i * 2= ^127 / nb_nodes).
>> >>>>>
>> >>>>> Jordzn
>> >>>>>
>> >>>>> On Tue, May 4, 2010 at 8:20 AM, Ran Tavory &l= t;rantav@gmail.com> wrote:
>> >>>>>>
>> >>>>>> I'm looking into performance issues on a 0.6.1 cluster. I see two
>> >>>>>> symptoms:
>> >>>>>> 1. Reads and writes are slow
>> >>>>>> 2. One of the hosts is doing a lot of GC.=
>> >>>>>> 1 is slow in the sense that in normal sta= te the cluster used to
>> >>>>>> make
>> >>>>>> around 3-5k read and writes per second (6= -10k operations per
>> >>>>>> second),
>> >>>>>> but
>> >>>>>> how it's in the order of 200-400 ops per second, sometimes even
>> >>>>>> less.
>> >>>>>> 2 looks like this:
>> >>>>>> $ tail -f /outbrain/cassandra/log/system.= log
>> >>>>>>  INFO [GC inspection] 2010-05-04 00:42:18,636 GCInspector.java
>> >>>>>> (line
>> >>>>>> 110)
>> >>>>>> GC for ParNew: 672 ms, 166482384 reclaime= d leaving 2872087208 used;
>> >>>>>> max is
>> >>>>>> 4432068608
>> >>>>>>  INFO [GC inspection] 2010-05-04 00:42:28,638 GCInspector.java
>> >>>>>> (line
>> >>>>>> 110)
>> >>>>>> GC for ParNew: 498 ms, 166493352 reclaime= d leaving 2836049448 used;
>> >>>>>> max is
>> >>>>>> 4432068608
>> >>>>>>  INFO [GC inspection] 2010-05-04 00:42:38,640 GCInspector.java
>> >>>>>> (line
>> >>>>>> 110)
>> >>>>>> GC for ParNew: 327 ms, 166091528 reclaime= d leaving 2796888424 used;
>> >>>>>> max is
>> >>>>>> 4432068608
>> >>>>>> ... and it goes on and on for hours, no stopping...
>> >>>>>> The cluster is made of 6 hosts, 3 in one = DC and 3 in another.
>> >>>>>> Each host has 8G RAM.
>> >>>>>> -Xmx=3D4G
>> >>>>>> For some reason, the load isn't distribut= ed evenly b/w the hosts,
>> >>>>>> although
>> >>>>>> I'm not sure this is the cause for slowne= ss
>> >>>>>> $ nodetool -h localhost -p 9004 ring
>> >>>>>> Address       Status  = ;   Load          Range
>> >>>>>>        Ring
>> >>>>>>
>> >>>>>> 144413773383729447702215082383444206680 >> >>>>>> 192.168.252.99Up       &nb= sp; 15.94 GB
>> >>>>>>  66002764663998929243644931915471302= 076     |<--|
>> >>>>>> 192.168.254.57Up       &nb= sp; 19.84 GB
>> >>>>>>  81288739225600737067856268063987022= 738     |   ^
>> >>>>>> 192.168.254.58Up       &nb= sp; 973.78 MB
>> >>>>>> 86999744104066390588161689990810839743 &n= bsp;   v   |
>> >>>>>> 192.168.252.62Up       &nb= sp; 5.18 GB
>> >>>>>> 88308919879653155454332084719458267849 &n= bsp;   |   ^
>> >>>>>> 192.168.254.59Up       &nb= sp; 10.57 GB
>> >>>>>>  14248216322037532819583794695317503= 3937    v   |
>> >>>>>> 192.168.252.61Up       &nb= sp; 11.36 GB
>> >>>>>>  14441377338372944770221508238344420= 6680    |-->|
>> >>>>>> The slow host is 192.168.252.61 and it is= n't the most loaded one.
>> >>>>>> The host is waiting a lot on IO and the l= oad average is usually 6-7
>> >>>>>> $ w
>> >>>>>>  00:42:56 up 11 days, 13:22,  1 user,  load average: 6.21, 5.52,
>> >>>>>> 3.93
>> >>>>>> $ vmstat 5
>> >>>>>> procs -----------memory---------- ---swap= -- -----io---- --system--
>> >>>>>> -----cpu------
>> >>>>>>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs
>> >>>>>> us
>> >>>>>> sy id
>> >>>>>> wa st
>> >>>>>>  0  8 2147844  45744  = ; 1816 4457384    6    5    66    32    5    2
>> >>>>>>  1
>> >>>>>>  1
>> >>>>>> 96  2  0
>> >>>>>>  0  8 2147164  49020  = ; 1808 4451596  385    0  2345    58 3372 9957<= br> >> >>>>>>  2
>> >>>>>>  2
>> >>>>>> 78 18  0
>> >>>>>>  0  3 2146432  45704  = ; 1812 4453956  342    0  2274   108 3937
>> >>>>>> 10732
>> >>>>>>  2  2
>> >>>>>> 78 19  0
>> >>>>>>  0  1 2146252  44696  = ; 1804 4453436  345  164  1939   294 3647 7833
>> >>>>>>  2
>> >>>>>>  2
>> >>>>>> 78 18  0
>> >>>>>>  0  1 2145960  46924  = ; 1744 4451260  158    0  2423   122 4354
>> >>>>>> 14597
>> >>>>>>  2  2
>> >>>>>> 77 18  0
>> >>>>>>  7  1 2138344  44676  = ;  952 4504148 1722  403  1722   406 1388  439
>> >>>>>> 87
>> >>>>>>  0
>> >>>>>> 10  2  0
>> >>>>>>  7  2 2137248  45652  = ;  956 4499436 1384  655  1384   658 1356  392
>> >>>>>> 87
>> >>>>>>  0
>> >>>>>> 10  3  0
>> >>>>>>  7  1 2135976  46764  = ;  956 4495020 1366  718  1366   718 1395  380
>> >>>>>> 87
>> >>>>>>  0
>> >>>>>>  9  4  0
>> >>>>>>  0  8 2134484  46964  = ;  956 4489420 1673  555  1814   586 1601
>> >>>>>> 215590
>> >>>>>> 14
>> >>>>>>  2 68 16  0
>> >>>>>>  0  1 2135388  47444  = ;  972 4488516  785  833  2390   995 3812 8305
>> >>>>>>  2
>> >>>>>>  2
>> >>>>>> 77 20  0
>> >>>>>>  0 10 2135164  45928    980 4488796  788  543  2275   626 36
>> >>>>>> So, the host is swapping like crazy... >> >>>>>> top shows that it's using a lot of memory= . As noted before -Xmx=3D4G
>> >>>>>> and
>> >>>>>> nothing else seems to be using a lot of memory on the host except
>> >>>>>> for
>> >>>>>> the
>> >>>>>> cassandra process, however, of the 8G ram= on the host, 92% is used
>> >>>>>> by
>> >>>>>> cassandra. How's that?
>> >>>>>> Top shows there's 3.9g Shared and 7.2g Resident and 15.9g Virtual.
>> >>>>>> Why
>> >>>>>> does it have 15g virtual? And why 7.2 RES= ? This can explain the
>> >>>>>> slowness in
>> >>>>>> swapping.
>> >>>>>> $ top
>> >>>>>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
>> >>>>>>  COMMAND
>> >>>>>>
>> >>>>>>
>> >>>>>> 20281 cassandr  25   0 15.9g 7.= 2g 3.9g S 33.3 92.6 175:30.27 java
>> >>>>>> So, can the total memory be controlled? >> >>>>>> Or perhaps I'm looking in the wrong direction...
>> >>>>>> I've looked at all the cassandra JMX coun= ts and nothing seemed
>> >>>>>> suspicious
>> >>>>>> so far. By suspicious i mean a large numb= er of pending tasks -
>> >>>>>> there
>> >>>>>> were
>> >>>>>> always very small numbers in each pool. >> >>>>>> About read and write latencies, I'm not s= ure what the normal state
>> >>>>>> is,
>> >>>>>> but
>> >>>>>> here's an example of what I see on the problematic host:
>> >>>>>> #mbean =3D org.apache.cassandra.service:type=3DStorageProxy:
>> >>>>>> RecentReadLatencyMicros =3D 30105.8881806= 84495;
>> >>>>>> TotalReadLatencyMicros =3D 78543052801; >> >>>>>> TotalWriteLatencyMicros =3D 4213118609; >> >>>>>> RecentWriteLatencyMicros =3D 1444.4809201925639;
>> >>>>>> ReadOperations =3D 4779553;
>> >>>>>> RangeOperations =3D 0;
>> >>>>>> TotalRangeLatencyMicros =3D 0;
>> >>>>>> RecentRangeLatencyMicros =3D NaN;
>> >>>>>> WriteOperations =3D 4740093;
>> >>>>>> And the only pool that I do see some pend= ing tasks is the
>> >>>>>> ROW-READ-STAGE,
>> >>>>>> but it doesn't look like much, usually ar= ound 6-8:
>> >>>>>> #mbean =3D org.apache.cassandra.concurrent:type=3DROW-READ-STAGE:
>> >>>>>> ActiveCount =3D 8;
>> >>>>>> PendingTasks =3D 8;
>> >>>>>> CompletedTasks =3D 5427955;
>> >>>>>> Any help finding the solution is apprecia= ted, thanks...
>> >>>>>> Below are a few more JMXes I collected fr= om the system that may be
>> >>>>>> interesting.
>> >>>>>> #mbean =3D java.lang:type=3DMemory:
>> >>>>>> Verbose =3D false;
>> >>>>>> HeapMemoryUsage =3D {
>> >>>>>>   committed =3D 3767279616;
>> >>>>>>   init =3D 134217728;
>> >>>>>>   max =3D 4293656576;
>> >>>>>>   used =3D 1237105080;
>> >>>>>>  };
>> >>>>>> NonHeapMemoryUsage =3D {
>> >>>>>>   committed =3D 35061760;
>> >>>>>>   init =3D 24313856;
>> >>>>>>   max =3D 138412032;
>> >>>>>>   used =3D 23151320;
>> >>>>>>  };
>> >>>>>> ObjectPendingFinalizationCount =3D 0;
>> >>>>>> #mbean =3D java.lang:name=3DParNew,type=3DGarbageCollector:
>> >>>>>> LastGcInfo =3D {
>> >>>>>>   GcThreadCount =3D 11;
>> >>>>>>   duration =3D 136;
>> >>>>>>   endTime =3D 42219272;
>> >>>>>>   id =3D 11719;
>> >>>>>>   memoryUsageAfterGc =3D {
>> >>>>>>     ( CMS Perm Gen ) =3D {
>> >>>>>>       key =3D CMS Perm Gen= ;
>> >>>>>>       value =3D {
>> >>>>>>         committed =3D 29229056;
>> >>>>>>         init =3D 2175= 7952;
>> >>>>>>         max =3D 88080= 384;
>> >>>>>>         used =3D 1764= 8848;
>> >>>>>>        };
>> >>>>>>      };
>> >>>>>>     ( Code Cache ) =3D {
>> >>>>>>       key =3D Code Cache;<= br> >> >>>>>>       value =3D {
>> >>>>>>         committed =3D 5832704;
>> >>>>>>         init =3D 2555= 904;
>> >>>>>>         max =3D 50331= 648;
>> >>>>>>         used =3D 5563= 520;
>> >>>>>>        };
>> >>>>>>      };
>> >>>>>>     ( CMS Old Gen ) =3D {
>> >>>>>>       key =3D CMS Old Gen;=
>> >>>>>>       value =3D {
>> >>>>>>         committed =3D 3594133504;
>> >>>>>>         init =3D 1124= 59776;
>> >>>>>>         max =3D 41205= 10464;
>> >>>>>>         used =3D 9645= 65720;
>> >>>>>>        };
>> >>>>>>      };
>> >>>>>>     ( Par Eden Space ) =3D { >> >>>>>>       key =3D Par Eden Spa= ce;
>> >>>>>>       value =3D {
>> >>>>>>         committed =3D 171835392;
>> >>>>>>         init =3D 2149= 5808;
>> >>>>>>         max =3D 17183= 5392;
>> >>>>>>         used =3D 0; >> >>>>>>        };
>> >>>>>>      };
>> >>>>>>     ( Par Survivor Space ) =3D = {
>> >>>>>>       key =3D Par Survivor Space;
>> >>>>>>       value =3D {
>> >>>>>>         committed =3D 1310720;
>> >>>>>>         init =3D 1310= 72;
>> >>>>>>         max =3D 13107= 20;
>> >>>>>>         used =3D 0; >> >>>>>>        };
>> >>>>>>      };
>> >>>>>>    };
>> >>>>>>   memoryUsageBeforeGc =3D {
>> >>>>>>     ( CMS Perm Gen ) =3D {
>> >>>>>>       key =3D CMS Perm Gen= ;
>> >>>>>>       value =3D {
>> >>>>>>         committed =3D 29229056;
>> >>>>>>         init =3D 2175= 7952;
>> >>>>>>         max =3D 88080= 384;
>> >>>>>>         used =3D 1764= 8848;
>> >>>>>>        };
>> >>>>>>      };
>> >>>>>>     ( Code Cache ) =3D {
>> >>>>>>       key =3D Code Cache;<= br> >> >>>>>>       value =3D {
>> >>>>>>         committed =3D 5832704;
>> >>>>>>         init =3D 2555= 904;
>> >>>>>>         max =3D 50331= 648;
>> >>>>>>         used =3D 5563= 520;
>> >>>>>>        };
>> >>>>>>      };
>> >>>>>>     ( CMS Old Gen ) =3D {
>> >>>>>>       key =3D CMS Old Gen;=
>> >>>>>>       value =3D {
>> >>>>>>         committed =3D 3594133504;
>> >>>>>>         init =3D 1124= 59776;
>> >>>>>>         max =3D 41205= 10464;
>> >>>>>>         used =3D 9592= 21872;
>> >>>>>>        };
>> >>>>>>      };
>> >>>>>>     ( Par Eden Space ) =3D { >> >>>>>>       key =3D Par Eden Spa= ce;
>> >>>>>>       value =3D {
>> >>>>>>         committed =3D 171835392;
>> >>>>>>         init =3D 2149= 5808;
>> >>>>>>         max =3D 17183= 5392;
>> >>>>>>         used =3D 1718= 35392;
>> >>>>>>        };
>> >>>>>>      };
>> >>>>>>     ( Par Survivor Space ) =3D = {
>> >>>>>>       key =3D Par Survivor Space;
>> >>>>>>       value =3D {
>> >>>>>>         committed =3D 1310720;
>> >>>>>>         init =3D 1310= 72;
>> >>>>>>         max =3D 13107= 20;
>> >>>>>>         used =3D 0; >> >>>>>>        };
>> >>>>>>      };
>> >>>>>>    };
>> >>>>>>   startTime =3D 42219136;
>> >>>>>>  };
>> >>>>>> CollectionCount =3D 11720;
>> >>>>>> CollectionTime =3D 4561730;
>> >>>>>> Name =3D ParNew;
>> >>>>>> Valid =3D true;
>> >>>>>> MemoryPoolNames =3D [ Par Eden Space, Par Survivor Space ];
>> >>>>>> #mbean =3D java.lang:type=3DOperatingSyst= em:
>> >>>>>> MaxFileDescriptorCount =3D 63536;
>> >>>>>> OpenFileDescriptorCount =3D 75;
>> >>>>>> CommittedVirtualMemorySize =3D 1778771148= 8;
>> >>>>>> FreePhysicalMemorySize =3D 45522944;
>> >>>>>> FreeSwapSpaceSize =3D 2123968512;
>> >>>>>> ProcessCpuTime =3D 12251460000000;
>> >>>>>> TotalPhysicalMemorySize =3D 8364417024; >> >>>>>> TotalSwapSpaceSize =3D 4294959104;
>> >>>>>> Name =3D Linux;
>> >>>>>> AvailableProcessors =3D 8;
>> >>>>>> Arch =3D amd64;
>> >>>>>> SystemLoadAverage =3D 4.36;
>> >>>>>> Version =3D 2.6.18-164.15.1.el5;
>> >>>>>> #mbean =3D java.lang:type=3DRuntime:
>> >>>>>> Name =3D 20281@ob1061.nydc1.outbrain= .com;
>> >>>>>>
>> >>>>>> ClassPath =3D
>> >>>>>>
>> >>>>>>
>> >>>>>> /outbrain/cassandra/apache-cassandra-0.6.1/bin/../conf:/outbrain/cassandra/= apache-cassandra-0.6.1/bin/../build/classes:/outbrain/cassandra/apache-cass= andra-0.6.1/bin/..
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> /lib/antlr-3.1.3.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/= apache-cassandra-0.6.1.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/.= ./lib/avro-1.2.0-dev.jar:/outb
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> rain/cassandra/apache-cassandra-0.6.1/bin= /../lib/clhm-production.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/= ../lib/commons-cli-1.1.jar:/outbrain/cassandra/apache-cassandra-
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> 0.6.1/bin/../lib/commons-codec-1.2.jar:/outbrain/cassandra/apache-cassandra= -0.6.1/bin/../lib/commons-collections-3.2.1.jar:/outbrain/cassandra/apache-= cassandra-0.6.1/bin/../lib/com
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> mons-lang-2.4.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/goo= gle-collections-1.0.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../l= ib/hadoop-core-0.20.1.jar:/out
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> brain/cassandra/apache-cassandra-0.6.1/bin/../lib/high-scale-lib.jar:/outbr= ain/cassandra/apache-cassandra-0.6.1/bin/../lib/ivy-2.1.0.jar:/outbrain/cas= sandra/apache-cassandra-0.6.1/
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> bin/../lib/jackson-core-asl-1.4.0.jar:/outbrain/cassandra/apache-cassandra-= 0.6.1/bin/../lib/jackson-mapper-asl-1.4.0.jar:/outbrain/cassandra/apache-ca= ssandra-0.6.1/bin/../lib/jline
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> -0.9.94.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/json-simp= le-1.1.jar:/outbrain/cassandra/apache-cassandra-0.6.1/bin/../lib/libthrift-= r917130.jar:/outbrain/cassandr
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> a/apache-cassandra-0.6.1/bin/../lib/log4j-1.2.14.jar:/outbrain/cassandra/ap= ache-cassandra-0.6.1/bin/../lib/slf4j-api-1.5.8.jar:/outbrain/cassandra/apa= che-cassandra-0.6.1/bin/../lib
>> >>>>>> /slf4j-log4j12-1.5.8.jar;
>> >>>>>>
>> >>>>>> BootClassPath =3D
>> >>>>>>
>> >>>>>>
>> >>>>>> /usr/java/jdk1.6.0_17/jre/lib/alt-rt.jar:/usr/java/jdk1.6.0_17/jre/lib/reso= urces.jar:/usr/java/jdk1.6.0_17/jre/lib/rt.jar:/usr/java/jdk1.6.0_17/jre/li= b/sunrsasign.j
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> ar:/usr/java/jdk1.6.0_17/jre/lib/jsse.jar:/usr/java/jdk1.6.0_17/jre/lib/jce= .jar:/usr/java/jdk1.6.0_17/jre/lib/charsets.jar:/usr/java/jdk1.6.0_17/jre/c= lasses;
>> >>>>>>
>> >>>>>> LibraryPath =3D
>> >>>>>>
>> >>>>>>
>> >>>>>> /usr/java/jdk1.6.0_17/jre/lib/amd64/serve= r:/usr/java/jdk1.6.0_17/jre/lib/amd64:/usr/java/jdk1.6.0_17/jre/../lib/amd6= 4:/usr/java/packages/lib/amd64:/lib:/usr/lib;
>> >>>>>>
>> >>>>>> VmName =3D Java HotSpot(TM) 64-Bit Server= VM;
>> >>>>>>
>> >>>>>> VmVendor =3D Sun Microsystems Inc.;
>> >>>>>>
>> >>>>>> VmVersion =3D 14.3-b01;
>> >>>>>>
>> >>>>>> BootClassPathSupported =3D true;
>> >>>>>>
>> >>>>>> InputArguments =3D [ -ea, -Xms128M, -Xmx4= G,
>> >>>>>> -XX:TargetSurvivorRatio=3D90,
>> >>>>>> -XX:+AggressiveOpts, -XX:+UseParNewGC, -XX:+UseConcMarkSweepGC,
>> >>>>>> -XX:+CMSParallelRemarkEnabled, -XX:+HeapDumpOnOutOfMemoryError,
>> >>>>>> -XX:SurvivorRatio=3D128, -XX:MaxTenuringThreshold=3D0,
>> >>>>>> -Dcom.sun.management.jmxremote.port=3D900= 4,
>> >>>>>> -Dcom.sun.management.jmxremote.ssl=3Dfals= e,
>> >>>>>> -Dcom.sun.management.jmxremote.authentica= te=3Dfalse,
>> >>>>>>
>> >>>>>>
>> >>>>>> -Dstorage-config=3D/outbrain/cassandra/apache-cassandra-0.6.1/bin/../conf,<= br> >> >>>>>> -Dcassandra-pidfile=3D/var/run/cassandra.= pid ];
>> >>>>>>
>> >>>>>> ManagementSpecVersion =3D 1.2;
>> >>>>>>
>> >>>>>> SpecName =3D Java Virtual Machine Specification;
>> >>>>>>
>> >>>>>> SpecVendor =3D Sun Microsystems Inc.;
>> >>>>>>
>> >>>>>> SpecVersion =3D 1.0;
>> >>>>>>
>> >>>>>> StartTime =3D 1272911001415;
>> >>>>>> ...
>> >>>>>
>> >>>
>> >>>
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Jonathan Ellis
>> > Project Chair, Apache Cassandra
>> > co-founder of Riptano, the source for professional Cassandra support
>> > http://ripta= no.com
>>
>
>


--

Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

 

--_000_A683EE2D55D14244B72772B0A53B3A1B012F43A9A8FBihcommImage_--