Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of skrolle@gmail.com designates
 209.85.160.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <EA14BF9D-FAB8-423E-8DBA-D9E8716FE1A8@thelastpickle.com>
References: 
 <CAN3fqkzHS1P_kHd_fqsf=k3TEFm30t8Svr-sTmAP_COb8cj2Uw@mail.gmail.com>
	<EA14BF9D-FAB8-423E-8DBA-D9E8716FE1A8@thelastpickle.com>
Date: Sat, 15 Sep 2012 17:23:21 +0200
Message-ID: 
 <CAN3fqkxjb=9J9Ta3pu4y6XaTdtQ21J8ZSPKmeGunOUN7dbgpFw@mail.gmail.com>
Subject: Re: Suggestions for tuning and monitoring.
From: =?ISO-8859-1?Q?Henrik_Schr=F6der?= <skrolle@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=20cf3011df95ed29c904c9bf1e23

--20cf3011df95ed29c904c9bf1e23
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

I started digging in the logfiles, and here's the full log of a crash. We
have 4GB of heap, and if you watch either OpsCenter or through a JMX
console, used heap size is always below 3GB, and it swings between 1.5 to
3GB ever 5 minutes or so. So why would it suddenly run out of heap space?


 INFO 14:08:50,078 Completed flushing
/data-ssd/cassandra/data/system/HintsColumnFamily-hd-10-Data.db (5328100
bytes)
 INFO 14:08:50,079 4103 MUTATION messages dropped in last 5000ms
 INFO 14:08:50,080 Pool Name                    Active   Pending   Blocked
 INFO 14:08:50,086 Writing Memtable-A@1734568216(117729674/147162092
serialized/live bytes, 12097 ops)
 INFO 14:08:50,089 ReadStage                        32       363         0
 INFO 14:08:50,089 RequestResponseStage              0         0         0
 INFO 14:08:50,089 ReadRepairStage                   1         1         0
 INFO 14:08:50,089 MutationStage                    32      5672         0
 INFO 14:08:50,090 ReplicateOnWriteStage             0         0         0
 INFO 14:08:50,090 GossipStage                       0         0         0
 INFO 14:08:50,090 AntiEntropyStage                  0         0         0
 INFO 14:08:50,090 MigrationStage                    0         0         0
 INFO 14:08:50,090 StreamStage                       0         0         0
 INFO 14:08:50,091 MemtablePostFlusher               1         6         0
 INFO 14:08:50,091 FlushWriter                       1         5         0
 INFO 14:08:50,091 MiscStage                         0         0         0
 INFO 14:08:50,091 InternalResponseStage             0         0         0
 INFO 14:08:50,094 Compacting
[SSTableReader(path=3D'/data-ssd/cassandra/data/system/HintsColumnFamily-hd=
-9-Data.db'),
SSTableReader(path=3D'/data-ssd/cassandra/data/system/HintsColumnFamily-hd-=
10-Data.db'),
SSTableReader(path=3D'/data-ssd/cassandra/data/system/HintsColumnFamily-hd-=
8-Data.db'),
SSTableReader(path=3D'/data-ssd/cassandra/data/system/HintsColumnFamily-hd-=
7-Data.db')]
 INFO 14:08:50,098 HintedHandoff                     0         0         0
 INFO 14:08:50,098 CompactionManager                 0         2
 INFO 14:08:50,099 MessagingService                n/a       0,1
 INFO 14:08:50,099 ColumnFamily                Memtable ops,data  Row cache
size/cap  Key cache size/cap
 INFO 14:08:50,099 system.NodeIdInfo
0,0                 0/0                 0/0
 INFO 14:08:50,099 system.IndexInfo
0,0                 0/0                 0/0
 INFO 14:08:50,099 system.LocationInfo
0,0                 0/0                 0/0
 INFO 14:08:50,100 system.Versions
0,0                 0/0                 0/0
 INFO 14:08:50,107 system.Migrations
0,0                 0/0                 0/0
 INFO 14:08:50,107 system.HintsColumnFamily
0,0                 0/0                 0/0
 INFO 14:08:50,107 system.Schema
0,0                 0/0                 0/0
 INFO 14:08:50,108 K.A
920,12311092                 0/0                 0/0
 INFO 14:08:50,127 K.B
111,1005040                 0/0           6000/6000
 INFO 14:08:50,127 K.C
13,23283                 0/0       211817/211817
 INFO 14:08:50,127 OpsCenter.rollups300
0,0                 0/0                 4/4
 INFO 14:08:50,128 OpsCenter.rollups60
1876,1452889                 0/0               30/30
 INFO 14:08:50,128 OpsCenter.rollups7200
0,0                 0/0               30/30
 INFO 14:08:50,137 Enqueuing flush of Memtable-rollups60@33961585(43148/145=
2889
serialized/live bytes, 1876 ops)
 INFO 14:08:50,160 OpsCenter.rollups86400
0,0                 0/0                 0/0
 INFO 14:08:50,169 OpsCenter.events
0,0                 0/0               13/13
 INFO 14:08:50,174 OpsCenter.events_timeline
0,0                 0/0                 0/0
 INFO 14:08:50,175 OpsCenter.settings
0,0                 0/0                 0/0
 INFO 14:08:50,175 OpsCenter.pdps
470,103983                 0/0                 0/0
 INFO 14:09:09,860 GC for ConcurrentMarkSweep: 2560 ms for 6 collections,
3669729184 used; max is 4118806528
 WARN 14:09:09,860 Heap is 0.8909690608317886 full.  You may need to reduce
memtable and/or cache sizes.  Cassandra will now flush up to the two
largest memtables to free up memory.  Adjust flush_largest_memtables_at
threshold in cassandra.yaml if you don't want Cassandra to do this
automatically
 WARN 14:09:09,860 Flushing CFS(Keyspace=3D'K', ColumnFamily=3D'A') to reli=
eve
memory pressure
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /var/lib/cassandra/java_1346605112.hprof ...
Heap dump file created [5553296481 bytes in 67.950 secs]
 INFO 14:11:54,810 InetAddress /10.100.1.116 is now dead.
 INFO 14:11:54,813 InetAddress /10.100.1.117 is now dead.
 INFO 14:11:54,814 InetAddress /10.100.1.118 is now dead.
ERROR 14:11:54,830 Fatal exception in thread
Thread[CompactionExecutor:565,1,main]
java.lang.OutOfMemoryError: Java heap space
        at
org.apache.cassandra.io.util.FastByteArrayOutputStream.expand(FastByteArray=
OutputStream.java:104)
        at
org.apache.cassandra.io.util.FastByteArrayOutputStream.write(FastByteArrayO=
utputStream.java:220)
        at java.io.DataOutputStream.write(Unknown Source)
        at
org.apache.cassandra.utils.ByteBufferUtil.write(ByteBufferUtil.java:323)
        at
org.apache.cassandra.utils.ByteBufferUtil.writeWithLength(ByteBufferUtil.ja=
va:316)
        at
org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:63=
)
        at
org.apache.cassandra.db.SuperColumnSerializer.serialize(SuperColumn.java:36=
1)
        at
org.apache.cassandra.db.SuperColumnSerializer.serialize(SuperColumn.java:33=
4)
        at
org.apache.cassandra.db.ColumnFamilySerializer.serializeForSSTable(ColumnFa=
milySerializer.java:89)
        at
org.apache.cassandra.db.compaction.PrecompactedRow.write(PrecompactedRow.ja=
va:136)
        at
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160=
)
        at
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.ja=
va:158)
        at
org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManag=
er.java:135)
        at
org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManag=
er.java:115)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
        at java.lang.Thread.run(Unknown Source)
 INFO 14:11:54,839 InetAddress /10.100.1.116 is now UP
 INFO 14:11:54,849 Stop listening to thrift clients
 INFO 14:11:54,850 InetAddress /10.100.1.117 is now UP
 INFO 14:11:54,850 InetAddress /10.100.1.118 is now UP
 INFO 14:11:54,862 Waiting for messaging service to quiesce
 INFO 14:12:06,909 MessagingService shutting down server thread.
ERROR 14:12:06,912 Fatal exception in thread Thread[Thread-117,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
shut down
        at
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExec=
ution(DebuggableThreadPoolExecutor.java:60)
        at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
        at
org.apache.cassandra.net.MessagingService.receive(MessagingService.java:515=
)
        at
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpCo=
nnection.java:157)
        at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.ja=
va:115)
ERROR 14:12:06,913 Fatal exception in thread Thread[Thread-116,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
shut down
        at
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExec=
ution(DebuggableThreadPoolExecutor.java:60)
        at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
        at
org.apache.cassandra.net.MessagingService.receive(MessagingService.java:515=
)
        at
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpCo=
nnection.java:157)
        at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.ja=
va:115)
ERROR 14:12:19,466 Fatal exception in thread Thread[Thread-118,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has
shut down
        at
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExec=
ution(DebuggableThreadPoolExecutor.java:60)
        at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
        at
org.apache.cassandra.net.MessagingService.receive(MessagingService.java:515=
)
        at
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpCo=
nnection.java:157)
        at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.ja=
va:115)
 INFO 14:12:19,466 Started hinted handoff for token: 1 with IP: /
10.100.1.116
 INFO 14:13:33,343 Completed flushing
/data-ssd/cassandra/data/K/A-hd-279743-Data.db (43508009 bytes)
 INFO 14:13:33,346 flushing high-traffic column family
CFS(Keyspace=3D'OpsCenter', ColumnFamily=3D'rollups7200') (estimated 0 byte=
s)
 INFO 14:13:33,350 flushing high-traffic column family
CFS(Keyspace=3D'OpsCenter', ColumnFamily=3D'rollups86400') (estimated 0 byt=
es)
 INFO 14:13:33,351 Writing Memtable-B@797498287(674046/53923680
serialized/live bytes, 1388162 ops)
 INFO 14:13:33,349 Compacting
[SSTableReader(path=3D'/data-ssd/cassandra/data/K/A-hd-279741-Data.db'),
SSTableReader(path=3D'/data-ssd/cassandra/data/K/A-hd-279235-Data.db'),
SSTableReader(path=3D'/data-ssd/cassandra/data/K/A-hd-279743-Data.db'),
SSTableReader(path=3D'/data-ssd/cassandra/data/K/A-hd-279506-Data.db')]
 INFO 14:13:33,351 flushing high-traffic column family
CFS(Keyspace=3D'OpsCenter', ColumnFamily=3D'events') (estimated 0 bytes)
 INFO 14:13:33,350 Enqueuing flush of Memtable-A@884602824(10126517/1265814=
6
serialized/live bytes, 948 ops)
 INFO 14:13:33,369 flushing high-traffic column family
CFS(Keyspace=3D'OpsCenter', ColumnFamily=3D'events_timeline') (estimated 0
bytes)
 INFO 14:13:33,369 flushing high-traffic column family
CFS(Keyspace=3D'OpsCenter', ColumnFamily=3D'settings') (estimated 0 bytes)
 INFO 14:13:33,369 flushing high-traffic column family
CFS(Keyspace=3D'OpsCenter', ColumnFamily=3D'pdps') (estimated 103983 bytes)
 INFO 14:14:50,122 Completed flushing
/data-ssd/cassandra/data/K/B-hd-14662-Data.db (5340050 bytes)
 INFO 14:14:50,123 26225 MUTATION messages dropped in last 5000ms
 INFO 14:14:50,123 Writing Memtable-C@841365631(436486/8776427
serialized/live bytes, 8752 ops)
 INFO 14:14:50,123 9 READ_REPAIR messages dropped in last 5000ms
 INFO 14:14:50,131 1519 READ messages dropped in last 5000ms
 INFO 14:14:50,146 Enqueuing flush of Memtable-pdps@1698224970(39228/103983
serialized/live bytes, 470 ops)
 INFO 14:14:50,167 Pool Name                    Active   Pending   Blocked
 INFO 14:14:50,169 ReadStage                        20        20         0
 INFO 14:14:50,169 RequestResponseStage              0         0         0
 INFO 14:14:50,170 ReadRepairStage                   1         1         0
 INFO 14:14:50,170 ReplicateOnWriteStage             0         0         0
 INFO 14:14:50,171 GossipStage                       0         0         0
 INFO 14:14:50,171 AntiEntropyStage                  0         0         0
 INFO 14:14:50,171 MigrationStage                    0         0         0
 INFO 14:14:50,172 StreamStage                       0         0         0
 INFO 14:14:50,172 MemtablePostFlusher               1         5         0
 INFO 14:14:50,172 FlushWriter                       1         5         1
 INFO 14:14:50,173 MiscStage                         0         0         0
 INFO 14:14:50,173 InternalResponseStage             0         0         0
 INFO 14:14:50,174 HintedHandoff                     1         3         0
 INFO 14:14:50,174 CompactionManager                 1         4
 INFO 14:14:50,174 MessagingService                n/a       0,0
 INFO 14:14:50,174 ColumnFamily                Memtable ops,data  Row cache
size/cap  Key cache size/cap
 INFO 14:14:50,194 system.NodeIdInfo
0,0                 0/0                 0/0
 INFO 14:14:50,194 system.IndexInfo
0,0                 0/0                 0/0
 INFO 14:14:50,194 system.LocationInfo
0,0                 0/0                 0/0
 INFO 14:14:50,195 system.Versions
0,0                 0/0                 0/0
 INFO 14:14:50,195 system.Migrations
0,0                 0/0                 0/0
 INFO 14:14:50,195 system.HintsColumnFamily
13,964160                 0/0                 0/0
 INFO 14:14:50,195 system.Schema
0,0                 0/0                 0/0
 INFO 14:14:50,195 K.A
0,0                 0/0                 0/0
 INFO 14:14:50,195 K.B
115,1037040                 0/0           6000/6000
 INFO 14:14:50,196 K.C
13,23283                 0/0       211817/211817
 INFO 14:14:50,196 OpsCenter.rollups300
0,0                 0/0                 4/4
 INFO 14:14:50,196 OpsCenter.rollups60
0,0                 0/0               30/30
 INFO 14:14:50,196 OpsCenter.rollups7200
0,0                 0/0               30/30
 INFO 14:14:50,196 OpsCenter.rollups86400
0,0                 0/0                 0/0
 INFO 14:14:50,196 OpsCenter.events
0,0                 0/0               13/13
 INFO 14:14:50,197 OpsCenter.events_timeline
0,0                 0/0                 0/0
 INFO 14:14:50,197 OpsCenter.settings
0,0                 0/0                 0/0
 INFO 14:14:50,197 OpsCenter.pdps
0,0                 0/0                 0/0
Service exit with a return value of 100


On Tue, Sep 11, 2012 at 3:57 AM, aaron morton <aaron@thelastpickle.com>wrot=
e:

> It's impossible to start new connections, or impossible to send requests,
> or it just doesn't return anything when you've sent a request.
>
> If it's totally frozen it sounds like GC. How long does it freeze for?
>
> Despite that, we occasionally get OOM exceptions, and nodes crashing,
> maybe a few times per month.
>
> Do you have an error stack ?
>
> We can't find anything in the cassandra logs indicating that something's =
up
>
> Is it logging dropped messages or high TP pending ? Are the freezes
> associated with compaction or repair running?
>
>  and occasionally we do bulk deletion of supercolumns in a row.
>
> mmm, are you sending a batch mutation with lots-o-deletes ? Each row
> mutation (insert or delete) in the batch becomes a thread pool tasks. If
> you send 1,000 rows in a batch you will temporarily prevent other request=
s
> from being served.
>
> The config options we are unsure about are things like commit log sizes, =
=85.
>
> I would try to find some indication of what's going on before tweaking.
> Have you checked iostat ?
>
> Hope that helps.
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 11/09/2012, at 2:05 AM, Henrik Schr=F6der <skrolle@gmail.com> wrote:
>
> Hi all,
>
> We're running a small Cassandra cluster (v1.0.10) serving data to our web
> application, and as our traffic grows, we're starting to see some weird
> issues. The biggest of these is that sometimes, a single node becomes
> unresponsive. It's impossible to start new connections, or impossible to
> send requests, or it just doesn't return anything when you've sent a
> request. Our client library is set to retry on another server when this
> happens, and what we see then is that the request is usually served
> instantly. So it's not the case that some requests are very difficult, it=
's
> that sometimes a node is just "busy", and we have no idea why or what it'=
s
> doing.
>
> We're using MRTG and Monit to monitor the servers, and in MRTG the averag=
e
> CPU usage is around 5%, on our quad-core Xeon servers with SSDs. But
> occassionally through Monit we can see that the 1-min load average goes
> above 4, and this usually corresponds to the above issues. Is this common=
?
> Does this happen to everyone else? And why the spikiness in load? We can'=
t
> find anything in the cassandra logs indicating that something's up (such =
as
> a slow GC or compaction), and there's no corresponding traffic spike in t=
he
> application either. Should we just add more nodes if any single one gets
> CPU spikes?
>
> Another explanation could also be that we've configured it wrong. We're
> running pretty much default config. Each node has 16G of RAM, 4GB of heap=
,
> no row-cache and an sizeable key-cache. Despite that, we occasionally get
> OOM exceptions, and nodes crashing, maybe a few times per month. Should w=
e
> increase heap size? Or move to 1.1 and enable off-heap caching?
>
> We have quite a lot of traffic to the cluster. A single keyspace with two
> column families, RF=3D3, compression is enabled, and we're running the
> default size-tiered compaction.
> Column family A has 60GB of actual data, each row has a single column, an=
d
> that column holds binary data that varies in size up to 500kB. When we
> update a row, we write a new value to this single column, effectively
> replacing that entire row. We do ~1000 updates/s, totalling ~10MB/s in
> writes.
> Column family B also has 60GB of actual data, but each row has a variable
> (~100-10000) number of supercolumns, and each supercolumn has a fixed
> number of columns with a fixed amount of data, totalling ~1kB. The
> operations we are doing on this column family is that we add supercolumns
> to rows with a rate of ~200/s, and occasionally we do bulk deletion of
> supercolumns in a row.
>
> The config options we are unsure about are things like commit log sizes,
> memtable flushing thresholds, commit log syncing, compaction throughput,
> etc. Are we at a point with our data size and write load that the default=
s
> aren't good enough anymore? Should we stick with size-tiered compaction,
> even though our application is update-heavy?
>
>
> Many thanks,
> /Henrik
>
>
>

--20cf3011df95ed29c904c9bf1e23
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

I started digging in the logfiles, and here&#39;s the full log of a crash. =
We have 4GB of heap, and if you watch either OpsCenter or through a JMX con=
sole, used heap size is always below 3GB, and it swings between 1.5 to 3GB =
ever 5 minutes or so. So why would it suddenly run out of heap space?<br>
<br><br>=A0INFO 14:08:50,078 Completed flushing /data-ssd/cassandra/data/sy=
stem/HintsColumnFamily-hd-10-Data.db (5328100 bytes)<br>=A0INFO 14:08:50,07=
9 4103 MUTATION messages dropped in last 5000ms<br>=A0INFO 14:08:50,080 Poo=
l Name=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Active=A0=
=A0 Pending=A0=A0 Blocked<br>
=A0INFO 14:08:50,086 Writing Memtable-A@1734568216(117729674/147162092 seri=
alized/live bytes, 12097 ops)<br>=A0INFO 14:08:50,089 ReadStage=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 32=A0=A0=A0=A0=A0=
=A0 363=A0=A0=A0=A0=A0=A0=A0=A0 0<br>=A0INFO 14:08:50,089 RequestResponseSt=
age=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=A0 0=A0=
=A0=A0=A0=A0=A0=A0=A0 0<br>
=A0INFO 14:08:50,089 ReadRepairStage=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0 1=A0=A0=A0=A0=A0=A0=A0=A0 1=A0=A0=A0=A0=A0=A0=A0=A0 0<br>=
=A0INFO 14:08:50,089 MutationStage=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0 32=A0=A0=A0=A0=A0 5672=A0=A0=A0=A0=A0=A0=A0=A0 0<br>=A0I=
NFO 14:08:50,090 ReplicateOnWriteStage=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 =
0=A0=A0=A0=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=A0 0<br>
=A0INFO 14:08:50,090 GossipStage=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=
=A0 0<br>=A0INFO 14:08:50,090 AntiEntropyStage=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=
=A0 0<br>=A0INFO 14:08:50,090 MigrationStage=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=
=A0=A0 0<br>
=A0INFO 14:08:50,090 StreamStage=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=
=A0 0<br>=A0INFO 14:08:50,091 MemtablePostFlusher=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0 1=A0=A0=A0=A0=A0=A0=A0=A0 6=A0=A0=A0=A0=A0=A0=A0=A0 0<br=
>=A0INFO 14:08:50,091 FlushWriter=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0 1=A0=A0=A0=A0=A0=A0=A0=A0 5=A0=A0=A0=A0=A0=A0=A0=
=A0 0<br>
=A0INFO 14:08:50,091 MiscStage=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=
=A0=A0 0<br>=A0INFO 14:08:50,091 InternalResponseStage=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=A0 0<br>=
=A0INFO 14:08:50,094 Compacting [SSTableReader(path=3D&#39;/data-ssd/cassan=
dra/data/system/HintsColumnFamily-hd-9-Data.db&#39;), SSTableReader(path=3D=
&#39;/data-ssd/cassandra/data/system/HintsColumnFamily-hd-10-Data.db&#39;),=
 SSTableReader(path=3D&#39;/data-ssd/cassandra/data/system/HintsColumnFamil=
y-hd-8-Data.db&#39;), SSTableReader(path=3D&#39;/data-ssd/cassandra/data/sy=
stem/HintsColumnFamily-hd-7-Data.db&#39;)]<br>
=A0INFO 14:08:50,098 HintedHandoff=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=A0 0=
<br>=A0INFO 14:08:50,098 CompactionManager=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=A0 2<br>=A0INFO 14:08:50,099 Messagi=
ngService=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 n/a=A0=A0=A0=A0=A0=
=A0 0,1<br>=A0INFO 14:08:50,099 ColumnFamily=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0 Memtable ops,data=A0 Row cache size/cap=A0 Key cache size/c=
ap<br>
=A0INFO 14:08:50,099 system.NodeIdInfo=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0,0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0 0/0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0<br>=
=A0INFO 14:08:50,099 system.IndexInfo=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0,0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0 0/0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0<=
br>=A0INFO 14:08:50,099 system.LocationInfo=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0,0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0 0/0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0<br>
=A0INFO 14:08:50,100 system.Versions=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0,0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0 0/0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0<=
br>=A0INFO 14:08:50,107 system.Migrations=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0,0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0 0/0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0<=
br>=A0INFO 14:08:50,107 system.HintsColumnFamily=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0 0,0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0 0/0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0<br>
=A0INFO 14:08:50,107 system.Schema=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0,0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0 0/0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0 0/0<br>=A0INFO 14:08:50,108 K.A=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 920,12311092=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0 0/0<br>=A0INFO 14:08:50,127 K.B=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 111,1005040=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0 6000/6000<br>
=A0INFO 14:08:50,127 K.C=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 13,23283=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0=A0=A0=A0=A0=A0=A0 211817/211817<br>=A0IN=
FO 14:08:50,127 OpsCenter.rollups300=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0 0,0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0 0/0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 4/4<br>=A0INFO 14:0=
8:50,128 OpsCenter.rollups60=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 1876,14=
52889=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0 30/30<br>
=A0INFO 14:08:50,128 OpsCenter.rollups7200=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0 0,0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0 0/0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 30/30<br>=A0INFO 14:08=
:50,137 Enqueuing flush of Memtable-rollups60@33961585(43148/1452889 serial=
ized/live bytes, 1876 ops)<br>
=A0INFO 14:08:50,160 OpsCenter.rollups86400=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0 0,0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0 0/0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0<br>=A0INFO 1=
4:08:50,169 OpsCenter.events=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0,0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0 0/0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 13/13<br>=A0INFO 14=
:08:50,174 OpsCenter.events_timeline=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0 0,0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0<br>
=A0INFO 14:08:50,175 OpsCenter.settings=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0,0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0 0/0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0<br>=A0=
INFO 14:08:50,175 OpsCenter.pdps=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0 470,103983=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0 0/0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0<br>=A0INFO 14:0=
9:09,860 GC for ConcurrentMarkSweep: 2560 ms for 6 collections, 3669729184 =
used; max is 4118806528<br>
=A0WARN 14:09:09,860 Heap is 0.8909690608317886 full.=A0 You may need to re=
duce memtable and/or cache sizes.=A0 Cassandra will now flush up to the two=
 largest memtables to free up memory.=A0 Adjust flush_largest_memtables_at =
threshold in cassandra.yaml if you don&#39;t want Cassandra to do this auto=
matically<br>
=A0WARN 14:09:09,860 Flushing CFS(Keyspace=3D&#39;K&#39;, ColumnFamily=3D&#=
39;A&#39;) to relieve memory pressure<br>java.lang.OutOfMemoryError: Java h=
eap space<br>Dumping heap to /var/lib/cassandra/java_1346605112.hprof ...<b=
r>
Heap dump file created [5553296481 bytes in 67.950 secs]<br>=A0INFO 14:11:5=
4,810 InetAddress /<a href=3D"http://10.100.1.116">10.100.1.116</a> is now =
dead.<br>=A0INFO 14:11:54,813 InetAddress /<a href=3D"http://10.100.1.117">=
10.100.1.117</a> is now dead.<br>
=A0INFO 14:11:54,814 InetAddress /<a href=3D"http://10.100.1.118">10.100.1.=
118</a> is now dead.<br>ERROR 14:11:54,830 Fatal exception in thread Thread=
[CompactionExecutor:565,1,main]<br>java.lang.OutOfMemoryError: Java heap sp=
ace<br>
=A0=A0=A0=A0=A0=A0=A0 at org.apache.cassandra.io.util.FastByteArrayOutputSt=
ream.expand(FastByteArrayOutputStream.java:104)<br>=A0=A0=A0=A0=A0=A0=A0 at=
 org.apache.cassandra.io.util.FastByteArrayOutputStream.write(FastByteArray=
OutputStream.java:220)<br>
=A0=A0=A0=A0=A0=A0=A0 at java.io.DataOutputStream.write(Unknown Source)<br>=
=A0=A0=A0=A0=A0=A0=A0 at org.apache.cassandra.utils.ByteBufferUtil.write(By=
teBufferUtil.java:323)<br>=A0=A0=A0=A0=A0=A0=A0 at org.apache.cassandra.uti=
ls.ByteBufferUtil.writeWithLength(ByteBufferUtil.java:316)<br>
=A0=A0=A0=A0=A0=A0=A0 at org.apache.cassandra.db.ColumnSerializer.serialize=
(ColumnSerializer.java:63)<br>=A0=A0=A0=A0=A0=A0=A0 at org.apache.cassandra=
.db.SuperColumnSerializer.serialize(SuperColumn.java:361)<br>=A0=A0=A0=A0=
=A0=A0=A0 at org.apache.cassandra.db.SuperColumnSerializer.serialize(SuperC=
olumn.java:334)<br>
=A0=A0=A0=A0=A0=A0=A0 at org.apache.cassandra.db.ColumnFamilySerializer.ser=
ializeForSSTable(ColumnFamilySerializer.java:89)<br>=A0=A0=A0=A0=A0=A0=A0 a=
t org.apache.cassandra.db.compaction.PrecompactedRow.write(PrecompactedRow.=
java:136)<br>=A0=A0=A0=A0=A0=A0=A0 at org.apache.cassandra.io.sstable.SSTab=
leWriter.append(SSTableWriter.java:160)<br>
=A0=A0=A0=A0=A0=A0=A0 at org.apache.cassandra.db.compaction.CompactionTask.=
execute(CompactionTask.java:158)<br>=A0=A0=A0=A0=A0=A0=A0 at org.apache.cas=
sandra.db.compaction.CompactionManager$1.call(CompactionManager.java:135)<b=
r>=A0=A0=A0=A0=A0=A0=A0 at org.apache.cassandra.db.compaction.CompactionMan=
ager$1.call(CompactionManager.java:115)<br>
=A0=A0=A0=A0=A0=A0=A0 at java.util.concurrent.FutureTask$Sync.innerRun(Unkn=
own Source)<br>=A0=A0=A0=A0=A0=A0=A0 at java.util.concurrent.FutureTask.run=
(Unknown Source)<br>=A0=A0=A0=A0=A0=A0=A0 at java.util.concurrent.ThreadPoo=
lExecutor$Worker.runTask(Unknown Source)<br>
=A0=A0=A0=A0=A0=A0=A0 at java.util.concurrent.ThreadPoolExecutor$Worker.run=
(Unknown Source)<br>=A0=A0=A0=A0=A0=A0=A0 at java.lang.Thread.run(Unknown S=
ource)<br>=A0INFO 14:11:54,839 InetAddress /<a href=3D"http://10.100.1.116"=
>10.100.1.116</a> is now UP<br>
=A0INFO 14:11:54,849 Stop listening to thrift clients<br>=A0INFO 14:11:54,8=
50 InetAddress /<a href=3D"http://10.100.1.117">10.100.1.117</a> is now UP<=
br>=A0INFO 14:11:54,850 InetAddress /<a href=3D"http://10.100.1.118">10.100=
.1.118</a> is now UP<br>
=A0INFO 14:11:54,862 Waiting for messaging service to quiesce<br>=A0INFO 14=
:12:06,909 MessagingService shutting down server thread.<br>ERROR 14:12:06,=
912 Fatal exception in thread Thread[Thread-117,5,main]<br>java.util.concur=
rent.RejectedExecutionException: ThreadPoolExecutor has shut down<br>
=A0=A0=A0=A0=A0=A0=A0 at org.apache.cassandra.concurrent.DebuggableThreadPo=
olExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)<br>=A0=
=A0=A0=A0=A0=A0=A0 at java.util.concurrent.ThreadPoolExecutor.reject(Unknow=
n Source)<br>=A0=A0=A0=A0=A0=A0=A0 at java.util.concurrent.ThreadPoolExecut=
or.execute(Unknown Source)<br>
=A0=A0=A0=A0=A0=A0=A0 at org.apache.cassandra.net.MessagingService.receive(=
MessagingService.java:515)<br>=A0=A0=A0=A0=A0=A0=A0 at org.apache.cassandra=
.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:157)<b=
r>=A0=A0=A0=A0=A0=A0=A0 at org.apache.cassandra.net.IncomingTcpConnection.r=
un(IncomingTcpConnection.java:115)<br>
ERROR 14:12:06,913 Fatal exception in thread Thread[Thread-116,5,main]<br>j=
ava.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut=
 down<br>=A0=A0=A0=A0=A0=A0=A0 at org.apache.cassandra.concurrent.Debuggabl=
eThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:6=
0)<br>
=A0=A0=A0=A0=A0=A0=A0 at java.util.concurrent.ThreadPoolExecutor.reject(Unk=
nown Source)<br>=A0=A0=A0=A0=A0=A0=A0 at java.util.concurrent.ThreadPoolExe=
cutor.execute(Unknown Source)<br>=A0=A0=A0=A0=A0=A0=A0 at org.apache.cassan=
dra.net.MessagingService.receive(MessagingService.java:515)<br>
=A0=A0=A0=A0=A0=A0=A0 at org.apache.cassandra.net.IncomingTcpConnection.rec=
eiveMessage(IncomingTcpConnection.java:157)<br>=A0=A0=A0=A0=A0=A0=A0 at org=
.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:=
115)<br>ERROR 14:12:19,466 Fatal exception in thread Thread[Thread-118,5,ma=
in]<br>
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shu=
t down<br>=A0=A0=A0=A0=A0=A0=A0 at org.apache.cassandra.concurrent.Debuggab=
leThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:=
60)<br>=A0=A0=A0=A0=A0=A0=A0 at java.util.concurrent.ThreadPoolExecutor.rej=
ect(Unknown Source)<br>
=A0=A0=A0=A0=A0=A0=A0 at java.util.concurrent.ThreadPoolExecutor.execute(Un=
known Source)<br>=A0=A0=A0=A0=A0=A0=A0 at org.apache.cassandra.net.Messagin=
gService.receive(MessagingService.java:515)<br>=A0=A0=A0=A0=A0=A0=A0 at org=
.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConne=
ction.java:157)<br>
=A0=A0=A0=A0=A0=A0=A0 at org.apache.cassandra.net.IncomingTcpConnection.run=
(IncomingTcpConnection.java:115)<br>=A0INFO 14:12:19,466 Started hinted han=
doff for token: 1 with IP: /<a href=3D"http://10.100.1.116">10.100.1.116</a=
><br>=A0INFO 14:13:33,343 Completed flushing /data-ssd/cassandra/data/K/A-h=
d-279743-Data.db (43508009 bytes)<br>
=A0INFO 14:13:33,346 flushing high-traffic column family CFS(Keyspace=3D=
9;OpsCenter&#39;, ColumnFamily=3D&#39;rollups7200&#39;) (estimated 0 bytes)=
<br>=A0INFO 14:13:33,350 flushing high-traffic column family CFS(Keyspace=
=3D&#39;OpsCenter&#39;, ColumnFamily=3D&#39;rollups86400&#39;) (estimated 0=
 bytes)<br>
=A0INFO 14:13:33,351 Writing Memtable-B@797498287(674046/53923680 serialize=
d/live bytes, 1388162 ops)<br>=A0INFO 14:13:33,349 Compacting [SSTableReade=
r(path=3D&#39;/data-ssd/cassandra/data/K/A-hd-279741-Data.db&#39;), SSTable=
Reader(path=3D&#39;/data-ssd/cassandra/data/K/A-hd-279235-Data.db&#39;), SS=
TableReader(path=3D&#39;/data-ssd/cassandra/data/K/A-hd-279743-Data.db&#39;=
), SSTableReader(path=3D&#39;/data-ssd/cassandra/data/K/A-hd-279506-Data.db=
&#39;)]<br>
=A0INFO 14:13:33,351 flushing high-traffic column family CFS(Keyspace=3D=
9;OpsCenter&#39;, ColumnFamily=3D&#39;events&#39;) (estimated 0 bytes)<br>=
=A0INFO 14:13:33,350 Enqueuing flush of Memtable-A@884602824(10126517/12658=
146 serialized/live bytes, 948 ops)<br>
=A0INFO 14:13:33,369 flushing high-traffic column family CFS(Keyspace=3D=
9;OpsCenter&#39;, ColumnFamily=3D&#39;events_timeline&#39;) (estimated 0 by=
tes)<br>=A0INFO 14:13:33,369 flushing high-traffic column family CFS(Keyspa=
ce=3D&#39;OpsCenter&#39;, ColumnFamily=3D&#39;settings&#39;) (estimated 0 b=
ytes)<br>
=A0INFO 14:13:33,369 flushing high-traffic column family CFS(Keyspace=3D=
9;OpsCenter&#39;, ColumnFamily=3D&#39;pdps&#39;) (estimated 103983 bytes)<b=
r>=A0INFO 14:14:50,122 Completed flushing /data-ssd/cassandra/data/K/B-hd-1=
4662-Data.db (5340050 bytes)<br>
=A0INFO 14:14:50,123 26225 MUTATION messages dropped in last 5000ms<br>=A0I=
NFO 14:14:50,123 Writing Memtable-C@841365631(436486/8776427 serialized/liv=
e bytes, 8752 ops)<br>=A0INFO 14:14:50,123 9 READ_REPAIR messages dropped i=
n last 5000ms<br>
=A0INFO 14:14:50,131 1519 READ messages dropped in last 5000ms<br>=A0INFO 1=
4:14:50,146 Enqueuing flush of Memtable-pdps@1698224970(39228/103983 serial=
ized/live bytes, 470 ops)<br>=A0INFO 14:14:50,167 Pool Name=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Active=A0=A0 Pending=A0=A0 Block=
ed<br>
=A0INFO 14:14:50,169 ReadStage=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0 20=A0=A0=A0=A0=A0=A0=A0 20=A0=A0=A0=A0=A0=A0=A0=A0=
 0<br>=A0INFO 14:14:50,169 RequestResponseStage=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=A0 0<br>=A0IN=
FO 14:14:50,170 ReadRepairStage=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0 1=A0=A0=A0=A0=A0=A0=A0=A0 1=A0=A0=A0=A0=A0=A0=A0=A0 0<br>
=A0INFO 14:14:50,170 ReplicateOnWriteStage=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0 0=A0=A0=A0=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=A0 0<br>=A0INFO 14:14:=
50,171 GossipStage=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=A0 0<br>=A0INFO =
14:14:50,171 AntiEntropyStage=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=A0 0<br>
=A0INFO 14:14:50,171 MigrationStage=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=A0 0<br=
>=A0INFO 14:14:50,172 StreamStage=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=
=A0 0<br>=A0INFO 14:14:50,172 MemtablePostFlusher=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0 1=A0=A0=A0=A0=A0=A0=A0=A0 5=A0=A0=A0=A0=A0=A0=A0=A0 0<br=
>
=A0INFO 14:14:50,172 FlushWriter=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0 1=A0=A0=A0=A0=A0=A0=A0=A0 5=A0=A0=A0=A0=A0=A0=A0=
=A0 1<br>=A0INFO 14:14:50,173 MiscStage=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=A0 0=A0=A0=A0=
=A0=A0=A0=A0=A0 0<br>=A0INFO 14:14:50,173 InternalResponseStage=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=A0 0=A0=A0=A0=A0=A0=A0=A0=
=A0 0<br>
=A0INFO 14:14:50,174 HintedHandoff=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0 1=A0=A0=A0=A0=A0=A0=A0=A0 3=A0=A0=A0=A0=A0=A0=A0=A0 0=
<br>=A0INFO 14:14:50,174 CompactionManager=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0 1=A0=A0=A0=A0=A0=A0=A0=A0 4<br>=A0INFO 14:14:50,174 Messagi=
ngService=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 n/a=A0=A0=A0=A0=A0=
=A0 0,0<br>=A0INFO 14:14:50,174 ColumnFamily=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0 Memtable ops,data=A0 Row cache size/cap=A0 Key cache size/c=
ap<br>
=A0INFO 14:14:50,194 system.NodeIdInfo=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0,0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0 0/0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0<br>=
=A0INFO 14:14:50,194 system.IndexInfo=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0,0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0 0/0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0<=
br>=A0INFO 14:14:50,194 system.LocationInfo=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0,0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0 0/0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0<br>
=A0INFO 14:14:50,195 system.Versions=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0,0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0 0/0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0<=
br>=A0INFO 14:14:50,195 system.Migrations=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0,0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0 0/0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0<=
br>=A0INFO 14:14:50,195 system.HintsColumnFamily=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0 13,964160=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0<br>
=A0INFO 14:14:50,195 system.Schema=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0,0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0 0/0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0 0/0<br>=A0INFO 14:14:50,195 K.A=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
 0,0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0<br>=A0INFO 14:14:50,195 K.B=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0 115,1037040=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0 6000/6000<br>
=A0INFO 14:14:50,196 K.C=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 13,23283=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0=A0=A0=A0=A0=A0=A0 211817/211817<br>=A0IN=
FO 14:14:50,196 OpsCenter.rollups300=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0 0,0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0 0/0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 4/4<br>=A0INFO 14:1=
4:50,196 OpsCenter.rollups60=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0 0,0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0=
/0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 30/30<br>
=A0INFO 14:14:50,196 OpsCenter.rollups7200=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0 0,0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0 0/0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 30/30<br>=A0INFO 14:14=
:50,196 OpsCenter.rollups86400=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0 0,0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0<br>=A0INFO 14:14:50,196 OpsC=
enter.events=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0 0,0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 13/13<br>
=A0INFO 14:14:50,197 OpsCenter.events_timeline=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0 0,0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0=
/0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0<br>=A0INFO 14:14:50,=
197 OpsCenter.settings=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0 0,0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0=
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0<br>=A0INFO 14:14:50,19=
7 OpsCenter.pdps=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0 0,0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0 0/0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0/0<br>
Service exit with a return value of 100<br><br><br><br><div class=3D"gmail_=
quote">On Tue, Sep 11, 2012 at 3:57 AM, aaron morton <span dir=3D"ltr">&lt;=
<a href=3D"mailto:aaron@thelastpickle.com" target=3D"_blank">aaron@thelastp=
ickle.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div style=3D"word-wrap:break-word"><div cla=
ss=3D"im"><blockquote type=3D"cite">It&#39;s impossible to start new connec=
tions, or impossible to send requests, or it just doesn&#39;t return anythi=
ng when you&#39;ve sent a request.</blockquote>
</div><div>If it&#39;s totally frozen it sounds like GC. How long does it f=
reeze for?</div><div><div class=3D"im"><br><blockquote type=3D"cite">Despit=
e that, we occasionally get OOM exceptions, and nodes crashing, maybe a few=
 times per month.=A0</blockquote>
</div>Do you have an error stack ?<div><br></div><div><div class=3D"im"><bl=
ockquote type=3D"cite">We can&#39;t find anything in the cassandra logs ind=
icating that something&#39;s up</blockquote></div>Is it logging dropped mes=
sages or high TP pending ? Are the freezes associated with compaction or re=
pair running?</div>
<div><br></div><div><div class=3D"im"><blockquote type=3D"cite">=A0and occa=
sionally we do bulk deletion of supercolumns in a row.</blockquote></div>mm=
m, are you sending a batch mutation with lots-o-deletes ? Each row mutation=
 (insert or delete) in the batch becomes a thread pool tasks. If you send 1=
,000 rows in a batch you will temporarily prevent other requests from being=
 served.</div>
<div><br></div><div><blockquote type=3D"cite">The config options we are uns=
ure about are things like commit log sizes,=A0=85.</blockquote>I would try =
to find some indication of what&#39;s going on before tweaking. Have you ch=
ecked iostat ?</div>
<div><br></div><div>Hope that helps.=A0</div><div><br><div>
<span style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;te=
xt-align:-webkit-auto;font-style:normal;font-weight:normal;line-height:norm=
al;border-collapse:separate;text-transform:none;font-size:medium;white-spac=
e:normal;font-family:Helvetica;word-spacing:0px"><span style=3D"text-indent=
:0px;letter-spacing:normal;font-variant:normal;font-style:normal;font-weigh=
t:normal;line-height:normal;border-collapse:separate;text-transform:none;fo=
nt-size:medium;white-space:normal;font-family:Helvetica;word-spacing:0px"><=
div style=3D"word-wrap:break-word">
<span style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;fo=
nt-style:normal;font-weight:normal;line-height:normal;border-collapse:separ=
ate;text-transform:none;font-size:medium;white-space:normal;font-family:Hel=
vetica;word-spacing:0px"><div style=3D"word-wrap:break-word">
<span style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;fo=
nt-style:normal;font-weight:normal;line-height:normal;border-collapse:separ=
ate;text-transform:none;font-size:medium;white-space:normal;font-family:Hel=
vetica;word-spacing:0px"><div style=3D"word-wrap:break-word">
<div><div>-----------------</div><div>Aaron Morton</div><div>Freelance Deve=
loper</div><div>@aaronmorton</div><div><a href=3D"http://www.thelastpickle.=
com" target=3D"_blank">http://www.thelastpickle.com</a></div></div></div></=
span></div>
</span></div></span></span>
</div><div><div class=3D"h5">

<br><div><div>On 11/09/2012, at 2:05 AM, Henrik Schr=F6der &lt;<a href=3D"m=
ailto:skrolle@gmail.com" target=3D"_blank">skrolle@gmail.com</a>&gt; wrote:=
</div><br><blockquote type=3D"cite">Hi all,<br><br>We&#39;re running a smal=
l Cassandra cluster (v1.0.10) serving data to our web application, and as o=
ur traffic grows, we&#39;re starting to see some weird issues. The biggest =
of these is that sometimes, a single node becomes unresponsive. It&#39;s im=
possible to start new connections, or impossible to send requests, or it ju=
st doesn&#39;t return anything when you&#39;ve sent a request. Our client l=
ibrary is set to retry on another server when this happens, and what we see=
 then is that the request is usually served instantly. So it&#39;s not the =
case that some requests are very difficult, it&#39;s that sometimes a node =
is just &quot;busy&quot;, and we have no idea why or what it&#39;s doing.<b=
r>

<br>We&#39;re using MRTG and Monit to monitor the servers, and in MRTG the =
average CPU usage is around 5%, on our quad-core Xeon servers with SSDs. Bu=
t occassionally through Monit we can see that the 1-min load average goes a=
bove 4, and this usually corresponds to the above issues. Is this common? D=
oes this happen to everyone else? And why the spikiness in load? We can&#39=
;t find anything in the cassandra logs indicating that something&#39;s up (=
such as a slow GC or compaction), and there&#39;s no corresponding traffic =
spike in the application either. Should we just add more nodes if any singl=
e one gets CPU spikes?<br>

<br>Another explanation could also be that we&#39;ve configured it wrong. W=
e&#39;re running pretty much default config. Each node has 16G of RAM, 4GB =
of heap, no row-cache and an sizeable key-cache. Despite that, we occasiona=
lly get OOM exceptions, and nodes crashing, maybe a few times per month. Sh=
ould we increase heap size? Or move to 1.1 and enable off-heap caching?<br>

<br>We have quite a lot of traffic to the cluster. A single keyspace with t=
wo column families, RF=3D3, compression is enabled, and we&#39;re running t=
he default size-tiered compaction.<br>Column family A has 60GB of actual da=
ta, each row has a single column, and that column holds binary data that va=
ries in size up to 500kB. When we update a row, we write a new value to thi=
s single column, effectively replacing that entire row. We do ~1000 updates=
/s, totalling ~10MB/s in writes.<br>

Column family B also has 60GB of actual data, but each row has a variable (=
~100-10000) number of supercolumns, and each supercolumn has a fixed number=
 of columns with a fixed amount of data, totalling ~1kB. The operations we =
are doing on this column family is that we add supercolumns to rows with a =
rate of ~200/s, and occasionally we do bulk deletion of supercolumns in a r=
ow.<br>

<br>The config options we are unsure about are things like commit log sizes=
, memtable flushing thresholds, commit log syncing, compaction throughput, =
etc. Are we at a point with our data size and write load that the defaults =
aren&#39;t good enough anymore? Should we stick with size-tiered compaction=
, even though our application is update-heavy?<br>

<br><br>Many thanks,<br>/Henrik<br>
</blockquote></div><br></div></div></div></div></div></blockquote></div><br=
>

--20cf3011df95ed29c904c9bf1e23--