Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 56D86C5BB for ; Thu, 27 Jun 2013 13:24:47 +0000 (UTC) Received: (qmail 73204 invoked by uid 500); 27 Jun 2013 13:24:44 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 72850 invoked by uid 500); 27 Jun 2013 13:24:41 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 72842 invoked by uid 99); 27 Jun 2013 13:24:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Jun 2013 13:24:40 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Ignace.Desimpel@nuance.com designates 198.71.66.80 as permitted sender) Received: from [198.71.66.80] (HELO som-mx-a.nuance.com) (198.71.66.80) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Jun 2013 13:24:34 +0000 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqIEAJw7zFEKHBQY/2dsb2JhbABbgkV1SbwcgmyBFHSCJQUMITIsARUVViYBBBvCXY8kgzpjA54JjhKCKA Received: from unknown (HELO SOM-CAS01.nuance.com) ([10.28.20.24]) by som-mx-a.nuance.com with ESMTP/TLS/AES128-SHA; 27 Jun 2013 09:24:12 -0400 Received: from SOM-EXCH02.nuance.com ([fe80::4992:8492:7315:6160]) by SOM-CAS01.nuance.com ([::1]) with mapi id 14.03.0123.003; Thu, 27 Jun 2013 09:24:12 -0400 From: "Desimpel, Ignace" To: "user@cassandra.apache.org" Subject: Too many open files and stopped compaction with many pending compaction tasks Thread-Topic: Too many open files and stopped compaction with many pending compaction tasks Thread-Index: Ac5zOZvWLHvhIp54Qia8OCZd7tzPog== Date: Thu, 27 Jun 2013 13:24:11 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.28.16.110] Content-Type: multipart/alternative; boundary="_000_FCD5C460700DCA4C8CEB173030733602078067CDSOMEXCH02nuance_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_FCD5C460700DCA4C8CEB173030733602078067CDSOMEXCH02nuance_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable On a test with 3 cassandra servers version 1.2.5 with replication factor 1 = and leveled compaction, I did a store last night and I did not see any prob= lem with Cassandra. On all 3 machine the compaction is stopped already seve= ral hours. However , one machine reports 650 pending compaction tasks (via = jmx). compaction_throughput_mb_per_sec is 0. Concurrent_compactors is 3. multithreaded_compaction =3D false. No other load on these machines. And when I start querying (using thrift), I get a 'too many open files' err= or on the machine with pending compaction tasks. Limits.conf setting for nofile is 65536 Using 'lsof' and 'wc -l' I get a count of 59577 files for Cassandra. Total count of keyspace files on disk : 20464. The 3 machines have an equal (+/-) data load of about 60 GB. I see that 2 m= achines have no unleveled or just 1 sstables on any keyspace, but on the ma= chine with troubles there is one keyspace having 670 unleveled sstables. Le= vel sstable histo [670,28,106,14] thus 818 sstables. An 'ls' on that direct= ory counts for 5729 files, which corresponds to the 818 sstable (7 files pe= r sstables). After restart of that machine I get 4037 open files for Cassandra. And also= compaction has restarted. Once finisched I get SSTableCountPerLEvel =3D [0= ,10, 109, 644]. Also, compaction reports speeds of 2.5 MB per sec. Seems slow too me. CPU l= ess than 10%, Disk 15% with peeks to 45% (15000 rpm scsi). 14 GB free memor= y. So I am puzzled about the number of open files and number of unleveled ssta= bles, and a not so fast compaction. Anything than can be done? Or to be done so that the next time I can get mo= re useful information? Regards, Ignace Example output of lsof is : java 10968 root 483r REG 8,17 10507031 14156174 /media= /datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunct= ion-ic-1512-Data.db java 10968 root 484u REG 8,1 33554432 29229231 /home/= cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-137226056= 8123.log java 10968 root 485r REG 8,17 10507031 14156174 /media= /datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunct= ion-ic-1512-Data.db java 10968 root 486r REG 8,17 10507031 14156174 /media= /datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunct= ion-ic-1512-Data.db java 10968 root 487r REG 8,17 39967253 14158943 /media= /datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFun= ction-ic-481-Data.db java 10968 root 488r REG 8,17 58641524 14158942 /media= /datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFun= ction-ic-481-Index.db java 10968 root 489r REG 8,17 10507031 14156174 /media= /datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunct= ion-ic-1512-Data.db java 10968 root 490r REG 8,17 10507031 14156174 /media= /datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunct= ion-ic-1512-Data.db java 10968 root 491r REG 8,17 10507031 14156174 /media= /datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunct= ion-ic-1512-Data.db java 10968 root 492u REG 8,1 33554432 29230501 /home/= cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-137226056= 8134.log java 10968 root 493r REG 8,17 10507031 14156174 /media= /datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunct= ion-ic-1512-Data.db java 10968 root 494r REG 8,17 10507031 14156174 /media= /datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunct= ion-ic-1512-Data.db java 10968 root 495r REG 8,17 10507031 14156174 /media= /datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunct= ion-ic-1512-Data.db java 10968 root 497u REG 8,1 33554432 29242455 /home/= cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-137226056= 8126.log java 10968 root 498r REG 8,17 10507031 14156174 /media= /datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunct= ion-ic-1512-Data.db java 10968 root 499r REG 8,17 39725539 14160146 /media= /datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFun= ction-ic-1019-Data.db java 10968 root 500r REG 8,17 56369841 14160005 /media= /datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFun= ction-ic-1019-Index.db java 10968 root 502r REG 8,17 10507031 14156174 /media= /datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunct= ion-ic-1512-Data.db java 10968 root 504r REG 8,17 1989198 14163384 /media= /datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFun= ction-ic-922-Data.db java 10968 root 505r REG 8,17 40679209 14161763 /media= /datadrive1/dbdatafile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFun= ction-ic-543-Data.db java 10968 root 506u REG 8,1 33554432 29250917 /home/= cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-137226056= 8106.log java 10968 root 507r REG 8,17 10507031 14156174 /media= /datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunct= ion-ic-1512-Data.db java 10968 root 508r REG 8,17 10507031 14156174 /media= /datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunct= ion-ic-1512-Data.db java 10968 root 509r REG 8,17 10507031 14156174 /media= /datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunct= ion-ic-1512-Data.db java 10968 root 510r REG 8,17 10507031 14156174 /media= /datadrive1/dbdatafile/Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunct= ion-ic-1512-Data.db java 10968 root 511u REG 8,1 33554432 29229238 /home/= cassandra/deployed/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-137226056= 8108.log --_000_FCD5C460700DCA4C8CEB173030733602078067CDSOMEXCH02nuance_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

On a test with 3 cassandra servers version 1.2.5 wit= h replication factor 1 and leveled compaction, I did a store last night and= I did not see any problem with Cassandra. On all 3 machine the compaction = is stopped already several hours. However , one machine reports 650 pending compaction tasks (via jmx).=

compaction_throughput_mb_per_sec is 0.

Concurrent_compactors is 3.

multithreaded_c= ompaction =3D false.

No other load on these machines.

 

And when I start querying (using thrift), I get a &#= 8217;too many open files’ error on the machine with pending compactio= n tasks.

 

Limits.conf setting for nofile is 65536

Using ‘lsof’  and  ‘wc &= #8211;l’ I get a count of  59577 files for Cassandra.=

Total count of keyspace files on disk : 20464.<= /o:p>

 

The 3 machines have an equal (+/-) data load of = about 60 GB. I see that 2 machines have no unleveled or just 1 sstables on = any keyspace, but on the machine with troubles there is one keyspace having= 670 unleveled sstables. Level sstable histo [670,28,106,14] thus 818 sstables. An ‘ls’ on that direc= tory counts for 5729 files, which corresponds to the 818 sstable (7 files p= er sstables).

 

After restart of that machine I get 4037 open files = for Cassandra. And also compaction has restarted. Once finisched I get SSTa= bleCountPerLEvel =3D [0,10, 109, 644].

Also, compaction reports speeds of 2.5 MB per sec. S= eems slow too me. CPU less than 10%, Disk 15% with peeks to 45% (15000 rpm = scsi). 14 GB free memory.

 

So I am puzzled about the number of open files and n= umber of unleveled sstables, and a not so fast compaction.

 

Anything than can be done? Or to be done so that the= next time I can get more useful information?

 

Regards,

Ignace

 

Example output of lsof is :

java    10968 root  483r &n= bsp; REG           &= nbsp;   8,17  10507031 14156174 /media/datadrive1/dbdatafile= /Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db

java    10968 root  484u &n= bsp; REG           &= nbsp;    8,1  33554432 29229231 /home/cassandra/deploye= d/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568123.log

java    10968 root  485r &n= bsp; REG           &= nbsp;   8,17  10507031 14156174 /media/datadrive1/dbdatafile= /Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db

java    10968 root  486r &n= bsp; REG           &= nbsp;   8,17  10507031 14156174 /media/datadrive1/dbdatafile= /Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db

java    10968 root  487r &n= bsp; REG           &= nbsp;   8,17  39967253 14158943 /media/datadrive1/dbdatafile= /Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-481-Data.db

java    10968 root  488r &n= bsp; REG           &= nbsp;   8,17  58641524 14158942 /media/datadrive1/dbdatafile= /Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-481-Index.db<= o:p>

java    10968 root  489r &n= bsp; REG           &= nbsp;   8,17  10507031 14156174 /media/datadrive1/dbdatafile= /Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db

java    10968 root  490r &n= bsp; REG           &= nbsp;   8,17  10507031 14156174 /media/datadrive1/dbdatafile= /Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db

java    10968 root  491r &n= bsp; REG           &= nbsp;   8,17  10507031 14156174 /media/datadrive1/dbdatafile= /Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db

java    10968 root  492u &n= bsp; REG           &= nbsp;    8,1  33554432 29230501 /home/cassandra/deploye= d/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568134.log

java    10968 root  493r &n= bsp; REG           &= nbsp;   8,17  10507031 14156174 /media/datadrive1/dbdatafile= /Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db

java    10968 root  494r &n= bsp; REG           &= nbsp;   8,17  10507031 14156174 /media/datadrive1/dbdatafile= /Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db

java    10968 root  495r &n= bsp; REG           &= nbsp;   8,17  10507031 14156174 /media/datadrive1/dbdatafile= /Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db

java    10968 root  497u &n= bsp; REG           &= nbsp;    8,1  33554432 29242455 /home/cassandra/deploye= d/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568126.log

java    10968 root  498r &n= bsp; REG           &= nbsp;   8,17  10507031 14156174 /media/datadrive1/dbdatafile= /Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db

java    10968 root  499r &n= bsp; REG           &= nbsp;   8,17  39725539 14160146 /media/datadrive1/dbdatafile= /Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-1019-Data.db<= o:p>

java    10968 root  500r &n= bsp; REG           &= nbsp;   8,17  56369841 14160005 /media/datadrive1/dbdatafile= /Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-1019-Index.db=

java    10968 root  502r &n= bsp; REG           &= nbsp;   8,17  10507031 14156174 /media/datadrive1/dbdatafile= /Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db

java    10968 root  504r &n= bsp; REG           &= nbsp;   8,17   1989198 14163384 /media/datadrive1/dbdat= afile/Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-922-Data= .db

java    10968 root  505r &n= bsp; REG           &= nbsp;   8,17  40679209 14161763 /media/datadrive1/dbdatafile= /Ks100K/ReverseStringFunction/Ks100K-ReverseStringFunction-ic-543-Data.db

java    10968 root  506u &n= bsp; REG           &= nbsp;    8,1  33554432 29250917 /home/cassandra/deploye= d/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568106.log

java    10968 root  507r &n= bsp; REG           &= nbsp;   8,17  10507031 14156174 /media/datadrive1/dbdatafile= /Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db

java    10968 root  508r &n= bsp; REG           &= nbsp;   8,17  10507031 14156174 /media/datadrive1/dbdatafile= /Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db

java    10968 root  509r &n= bsp; REG           &= nbsp;   8,17  10507031 14156174 /media/datadrive1/dbdatafile= /Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db

java    10968 root  510r &n= bsp; REG           &= nbsp;   8,17  10507031 14156174 /media/datadrive1/dbdatafile= /Ks100K/ReverseLabelFunction/Ks100K-ReverseLabelFunction-ic-1512-Data.db

java    10968 root  511u  &= nbsp;REG           &= nbsp;    8,1  33554432 29229238 /home/cassandra/deploye= d/data/cdi.cassandra.cdi/dbcommitlog/CommitLog-2-1372260568108.log

--_000_FCD5C460700DCA4C8CEB173030733602078067CDSOMEXCH02nuance_--