Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 77C8697EE for ; Tue, 5 Jun 2012 04:02:13 +0000 (UTC) Received: (qmail 55038 invoked by uid 500); 5 Jun 2012 04:02:11 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 54700 invoked by uid 500); 5 Jun 2012 04:02:10 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 54673 invoked by uid 99); 5 Jun 2012 04:02:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Jun 2012 04:02:09 +0000 X-ASF-Spam-Status: No, hits=-2.8 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_HI,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [143.182.124.37] (HELO mga14.intel.com) (143.182.124.37) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Jun 2012 04:02:04 +0000 Received: from azsmga002.ch.intel.com ([10.2.17.35]) by azsmga102.ch.intel.com with ESMTP; 04 Jun 2012 21:01:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.71,315,1320652800"; d="scan'208,217";a="107881435" Received: from azsmsx601.amr.corp.intel.com ([10.2.121.193]) by AZSMGA002.ch.intel.com with ESMTP; 04 Jun 2012 21:01:36 -0700 Received: from fmsmsx151.amr.corp.intel.com (10.19.17.220) by azsmsx601.amr.corp.intel.com (10.2.121.193) with Microsoft SMTP Server (TLS) id 8.2.255.0; Mon, 4 Jun 2012 21:01:35 -0700 Received: from fmsmsx101.amr.corp.intel.com ([169.254.1.33]) by FMSMSX151.amr.corp.intel.com ([169.254.6.233]) with mapi id 14.01.0355.002; Mon, 4 Jun 2012 21:01:35 -0700 From: "Poziombka, Wade L" To: "user@cassandra.apache.org" Subject: RE: memory issue on 1.1.0 Thread-Topic: memory issue on 1.1.0 Thread-Index: Ac1B3lnh5s9+1+fvRku4HYw5xZMA4gA8642AAAyFjDAADTKVsA== Date: Tue, 5 Jun 2012 04:01:34 +0000 Message-ID: References: <4DADFB65-A79C-4B59-B074-2AB158657D38@thelastpickle.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.1.200.107] Content-Type: multipart/alternative; boundary="_000_BEF40F3557A6B24CB87D1F17775EA4FD0FF44382FMSMSX101amrcor_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_BEF40F3557A6B24CB87D1F17775EA4FD0FF44382FMSMSX101amrcor_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable I have repeated the test on two quite large machines 12 core, 64 GB as5 box= es and still observed the problem. Interestingly about at the same point. Anything I can monitor... perhaps I'll hook the Yourkit profiler up to it t= o see if there is some kind of leak? Wade From: Poziombka, Wade L Sent: Monday, June 04, 2012 7:23 PM To: user@cassandra.apache.org Subject: RE: memory issue on 1.1.0 What JVM settings do you have? -Xms8G -Xmx8G -Xmn800m -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=3D8 -XX:MaxTenuringThreshold=3D1 -XX:CMSInitiatingOccupancyFraction=3D75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.rmi.server.hostname=3D127.0.0.1 -Djava.net.preferIPv4Stack=3Dtrue -Dcassandra-pidfile=3Dcassandra.pid What is the machine spec ? It is an RH AS5 x64 16gb memory 2 CPU cores 2.8 Ghz As it turns out it is somewhat wimpier than I thought. While weak on it do= es have a good amount of memory. It is paired with a larger machine. What settings do you have for key and row cache ? A: All the defaults. (yaml "template" attached); Do the CF's have secondary indexes ? A: Yes one has two. One of them is used in the key slice used to get the r= ow keys used to do the further mutations. How many clients / requests per second ? A: One client process with 10 threads connected to one of the two nodes in = the cluster. On thread reading the slice and putting work in a queue. 9 o= thers reading from this queue and applying the mutations. Mutations are co= mpleting at about 20,000/minute roughly. From: aaron morton [mailto:aaron@thelastpickle.com] Sent: Monday, June 04, 2012 4:17 PM To: user@cassandra.apache.org Subject: Re: memory issue on 1.1.0 Had a look at the log, this message INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2= 772) Unable to reduce heap usage since there are no dirty column families appears correct, it happens after some flush activity and there are not CF'= s with memtable data. But the heap is still full. Overall the server is overloaded, but it seems like it should be handling i= t better. What JVM settings do you have? What is the machine spec ? What settings do you have for key and row cache ? Do the CF's have secondary indexes ? How many clients / requests per second ? Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 4/06/2012, at 11:12 AM, Poziombka, Wade L wrote: Running a very write intensive (new column, delete old column etc.) process= and failing on memory. Log file attached. Curiously when I add new data I have never seen this have in past sent hund= reds of millions "new" transactions. It seems to be when I modify. my pro= cess is as follows key slice to get columns to modify in batches of 100, in separate threads m= odify those columns. I advance the slice with the start key each with last= key in previous batch. Mutations done are update a column value in one co= lumn family(token), delete column and add new column in another (pan). Runs well until after about 5 million rows then it seems to run out of memo= ry. Note that these column families are quite small. WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 145)= Heap is 0.7967470834946492 full. You may need to reduce memtable and/or c= ache sizes. Cassandra will now flush up to the two largest memtables to fr= ee up memory. Adjust flush_largest_memtables_at threshold in cassandra.yam= l if you don't want Cassandra to do this automatically INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2= 772) Unable to reduce heap usage since there are no dirty column families INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) InetA= ddress /10.230.34.170 is now UP INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java (line 122)= GC for ParNew: 206 ms for 1 collections, 7345969520 used; max is 850604851= 2 INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 122)= GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 5714800208 used; m= ax is 8506048512 ---------------- Keyspace: keyspace Read Count: 50042632 Read Latency: 0.23157864418482224 ms. Write Count: 44948323 Write Latency: 0.019460829472992797 ms. Pending Tasks: 0 Column Family: pan SSTable count: 5 Space used (live): 1977467326 Space used (total): 1977467326 Number of Keys (estimate): 16334848 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 74 Read Count: 14985122 Read Latency: 0.408 ms. Write Count: 19972441 Write Latency: 0.022 ms. Pending Tasks: 0 Bloom Filter False Postives: 829 Bloom Filter False Ratio: 0.00073 Bloom Filter Space Used: 37048400 Compacted row minimum size: 125 Compacted row maximum size: 149 Compacted row mean size: 149 Column Family: token SSTable count: 4 Space used (live): 1250973873 Space used (total): 1250973873 Number of Keys (estimate): 14217216 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 49 Read Count: 30059563 Read Latency: 0.167 ms. Write Count: 14985488 Write Latency: 0.014 ms. Pending Tasks: 0 Bloom Filter False Postives: 13642 Bloom Filter False Ratio: 0.00322 Bloom Filter Space Used: 28002984 Compacted row minimum size: 150 Compacted row maximum size: 258 Compacted row mean size: 224 Column Family: counters SSTable count: 2 Space used (live): 561549994 Space used (total): 561549994 Number of Keys (estimate): 9985024 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 38 Read Count: 4997947 Read Latency: 0.092 ms. Write Count: 9990394 Write Latency: 0.023 ms. Pending Tasks: 0 Bloom Filter False Postives: 191 Bloom Filter False Ratio: 0.37525 Bloom Filter Space Used: 18741152 Compacted row minimum size: 125 Compacted row maximum size: 179 Compacted row mean size: 150 ---------------- --_000_BEF40F3557A6B24CB87D1F17775EA4FD0FF44382FMSMSX101amrcor_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

I have repeated the test = on two quite large machines 12 core, 64 GB as5 boxes and still observed the= problem.  Interestingly about at the same point.

 <= /p>

Anything I can monitor= 230; perhaps I’ll hook the Yourkit profiler up to it to see if there = is some kind of leak?

 <= /p>

Wade

 <= /p>

From: Poziombk= a, Wade L
Sent: Monday, June 04, 2012 7:23 PM
To: user@cassandra.apache.org
Subject: RE: memory issue on 1.1.0

 

What JVM settings do you have?

        -Xms= 8G

        -Xmx8G

        -Xmn800m<= o:p>

        -XX:+= HeapDumpOnOutOfMemoryError

        -XX:+= UseParNewGC

        -XX:+= UseConcMarkSweepGC

        -XX:+= CMSParallelRemarkEnabled

        -XX:Survi= vorRatio=3D8

        -XX:MaxTe= nuringThreshold=3D1

        -XX:CMSIn= itiatingOccupancyFraction=3D75

        -XX:+= UseCMSInitiatingOccupancyOnly

        -Djava.rm= i.server.hostname=3D127.0.0.1

        -Djava.ne= t.preferIPv4Stack=3Dtrue

        -Dcassand= ra-pidfile=3Dcassandra.pid

 

 

What is the machine spec ? 

It is an RH AS5 x64

16gb memory

2 CPU cores 2.8 Ghz

 

As it turns out it is somewhat wimpier than I though= t.  While weak on it does have a good amount of memory.

It is paired with a larger machine.

 

 

What settings do you have for key and row cache ?&nb= sp;

A: All the defaults. (yaml “template” at= tached);

 

Do the CF's have secondary indexes ?

A: Yes one has two.  One of them is used in the= key slice used to get the row keys used to do the further mutations.<= /o:p>

 

How many clients / requests per second ? <= /o:p>

A: One client process with 10 threads connected to o= ne of the two nodes in the cluster.  On thread reading the slice and p= utting work in a queue.  9 others reading from this queue and applying= the mutations.  Mutations are completing at about 20,000/minute roughly.

 <= /p>

 <= /p>

From: aaron mo= rton [mailto:aaron@thelastpickle.com]
Sent: Monday, June 04, 2012 4:17 PM
To: user@cassandra.apache.org
Subject: Re: memory issue on 1.1.0

 

Had a look at the log, this message =

 

INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 Stor= ageService.java (line 2772) Unable to reduce heap usage since there are no = dirty column families

appears correct, it happens after some flush activit= y and there are not CF's with memtable data. But the heap is still full.&nb= sp;

 

Overall the server is overloaded, but it seems like = it should be handling it better. 

 

What JVM settings do you have? What is the machine s= pec ? 

What settings do you have for key and row cache ?&nb= sp;

Do the CF's have secondary indexes ?

How many clients / requests per second ? <= /o:p>

 

Cheers

 

-----------------

Aaron Morton

Freelance Developer<= /o:p>

@aaronmorton

 

On 4/06/2012, at 11:12 AM, Poziombka, Wade L wrote:<= o:p>

 

Running a very write intensive (new column, delete o= ld column etc.) process and failing on memory.  Log file attached.

Curiously when I add new data I have never seen this have in past sent hund= reds of millions "new" transactions.  It seems to be when I = modify.  my process is as follows

key slice to get columns to modify in batches of 100, in separate threads m= odify those columns.  I advance the slice with the start key each with= last key in previous batch.  Mutations done are update a column value= in one column family(token), delete column and add new column in another (pan).

Runs well until after about 5 million rows then it seems to run out of memo= ry.  Note that these column families are quite small.

WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 145)= Heap is 0.7967470834946492 full.  You may need to reduce memtable and= /or cache sizes.  Cassandra will now flush up to the two largest memta= bles to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automat= ically
INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2= 772) Unable to reduce heap usage since there are no dirty column families INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) InetA= ddress /10.230.34.170 is now UP
INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java (line 122)= GC for ParNew: 206 ms for 1 collections, 7345969520 used; max is 850604851= 2
INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 122)= GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 5714800208 used; m= ax is 8506048512

----------------
Keyspace: keyspace
       Read Count: 50042632
       Read Latency: 0.23157864418482224= ms.
       Write Count: 44948323
       Write Latency: 0.0194608294729927= 97 ms.
       Pending Tasks: 0
            &nb= sp;  Column Family: pan
            &nb= sp;  SSTable count: 5
            &nb= sp;  Space used (live): 1977467326
            &nb= sp;  Space used (total): 1977467326
            &nb= sp;  Number of Keys (estimate): 16334848
            &nb= sp;  Memtable Columns Count: 0
            &nb= sp;  Memtable Data Size: 0
            &nb= sp;  Memtable Switch Count: 74
            &nb= sp;  Read Count: 14985122
            &nb= sp;  Read Latency: 0.408 ms.
            &nb= sp;  Write Count: 19972441
            &nb= sp;  Write Latency: 0.022 ms.
            &nb= sp;  Pending Tasks: 0
            &nb= sp;  Bloom Filter False Postives: 829
            &nb= sp;  Bloom Filter False Ratio: 0.00073
            &nb= sp;  Bloom Filter Space Used: 37048400
            &nb= sp;  Compacted row minimum size: 125
            &nb= sp;  Compacted row maximum size: 149
            &nb= sp;  Compacted row mean size: 149

            &nb= sp;  Column Family: token
            &nb= sp;  SSTable count: 4
            &nb= sp;  Space used (live): 1250973873
            &nb= sp;  Space used (total): 1250973873
            &nb= sp;  Number of Keys (estimate): 14217216
            &nb= sp;  Memtable Columns Count: 0
            &nb= sp;  Memtable Data Size: 0
            &nb= sp;  Memtable Switch Count: 49
            &nb= sp;  Read Count: 30059563
            &nb= sp;  Read Latency: 0.167 ms.
            &nb= sp;  Write Count: 14985488
            &nb= sp;  Write Latency: 0.014 ms.
            &nb= sp;  Pending Tasks: 0
            &nb= sp;  Bloom Filter False Postives: 13642
            &nb= sp;  Bloom Filter False Ratio: 0.00322
            &nb= sp;  Bloom Filter Space Used: 28002984
            &nb= sp;  Compacted row minimum size: 150
            &nb= sp;  Compacted row maximum size: 258
            &nb= sp;  Compacted row mean size: 224

            &nb= sp;  Column Family: counters
            &nb= sp;  SSTable count: 2
            &nb= sp;  Space used (live): 561549994
            &nb= sp;  Space used (total): 561549994
            &nb= sp;  Number of Keys (estimate): 9985024
            &nb= sp;  Memtable Columns Count: 0
            &nb= sp;  Memtable Data Size: 0
            &nb= sp;  Memtable Switch Count: 38
            &nb= sp;  Read Count: 4997947
            &nb= sp;  Read Latency: 0.092 ms.
            &nb= sp;  Write Count: 9990394
            &nb= sp;  Write Latency: 0.023 ms.
            &nb= sp;  Pending Tasks: 0
            &nb= sp;  Bloom Filter False Postives: 191
            &nb= sp;  Bloom Filter False Ratio: 0.37525
            &nb= sp;  Bloom Filter Space Used: 18741152
            &nb= sp;  Compacted row minimum size: 125
            &nb= sp;  Compacted row maximum size: 179
            &nb= sp;  Compacted row mean size: 150

----------------
<cassandra.zip>

 

--_000_BEF40F3557A6B24CB87D1F17775EA4FD0FF44382FMSMSX101amrcor_--