Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CACCmCN_AsLSxiWJ7Lp_V6PPjLgMRVb3kiYs9hWn=PFOpfn8HJg@mail.gmail.com>
References: 
 <CACtG2e1-+e_STU6H_-ktLfSnk5BETFmrGBwsAFuagL-PxeyiXg@mail.gmail.com>
	<CACtG2e3knPM5xadiPhROpTK1kPcz9a7owgLp0WDGMiKFra7T3w@mail.gmail.com>
	<CAEDUwd24=a6SZZFzhcTnH3UfBaQt6MdiUCxyWrtymgUmDA7dWg@mail.gmail.com>
	<CACtG2e06XYcFu_EgudTogDn=EJcMsyOAFw9ipJdAHLpj9riPRw@mail.gmail.com>
	<CACtG2e0d4G0PYtj=SKRsXxdLVkzvMF+FfoOz8ZXSPjz=amq8-g@mail.gmail.com>
	<CACtG2e2U3dYaR_0Xfid+2TcWg8LX5-cRLKJtxuBNb7WCeXHFRA@mail.gmail.com>
	<CACtG2e3S36ckxYDL+EVdpsmQb9P5xdCiDQ7WcZZoVyagEHAMqQ@mail.gmail.com>
	<CACCmCN_wpLywK7vieSVqe8wxP3a71pnB+LG440p+gReT0MwXRg@mail.gmail.com>
	<CACtG2e18i6g5B-PoJKUA1S1+HZOtJvPWWx5w8JC3qJyT_OQsOQ@mail.gmail.com>
	<CACtG2e187VvK8458TycMB8GabcZhmN_FSP2vUcaBMVLv0dc1+A@mail.gmail.com>
	<CACCmCN_AsLSxiWJ7Lp_V6PPjLgMRVb3kiYs9hWn=PFOpfn8HJg@mail.gmail.com>
Date: Wed, 2 Dec 2015 11:21:07 +0100
Message-ID: 
 <CACtG2e32YJLGG_4p9t4L-Un=RBT2BRJz11MHUag4g0bJb4zuFA@mail.gmail.com>
Subject: Re: Cassandra compaction stuck? Should I disable?
From: "PenguinWhispererThe ." <th3penguinwhisperer@gmail.com>
To: Sebastian Estevez <sebastian.estevez@datastax.com>
Cc: Robert Coli <rcoli@eventbrite.com>, user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=047d7b6700cfed51ca0525e7a1d2

--047d7b6700cfed51ca0525e7a1d2
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

So it seems I found the problem.

The node opening a stream is waiting for the other node to respond but that
node never responds due to a broken pipe which makes Cassandra wait forever=
.

It's basically this issue:
https://issues.apache.org/jira/browse/CASSANDRA-8472
And this is the workaround/fix:
https://issues.apache.org/jira/browse/CASSANDRA-8611

So:
- update cassandra to >=3D2.0.11
- add option streaming_socket_timeout_in_ms =3D 10000
- do rolling restart of cassandra

What's weird is that the IOException: Broken pipe is never shown in my logs
(not on any node). And my logging is set to INFO in log4j config.
I have this config in log4j-server.properties:
# output messages into a rolling log file as well as stdout
log4j.rootLogger=3DINFO,stdout,R

# stdout
log4j.appender.stdout=3Dorg.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=3Dorg.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=3D%5p %d{HH:mm:ss,SSS} %m%n

# rolling log file
log4j.appender.R=3Dorg.apache.log4j.RollingFileAppender
log4j.appender.R.maxFileSize=3D20MB
log4j.appender.R.maxBackupIndex=3D50
log4j.appender.R.layout=3Dorg.apache.log4j.PatternLayout
log4j.appender.R.layout.ConversionPattern=3D%5p [%t] %d{ISO8601} %F (line %=
L)
%m%n
# Edit the next line to point to your logs directory
log4j.appender.R.File=3D/var/log/cassandra/system.log

# Application logging options
#log4j.logger.org.apache.cassandra=3DDEBUG
#log4j.logger.org.apache.cassandra.db=3DDEBUG
#log4j.logger.org.apache.cassandra.service.StorageProxy=3DDEBUG

# Adding this to avoid thrift logging disconnect errors.
log4j.logger.org.apache.thrift.server.TNonblockingServer=3DERROR

Too bad nobody else could point to those. Hope it helps someone else from
wasting a lot of time.

2015-11-11 15:42 GMT+01:00 Sebastian Estevez <sebastian.estevez@datastax.co=
m
>:

> Use 'nodetool compactionhistory'
>
> all the best,
>
> Sebasti=C3=A1n
> On Nov 11, 2015 3:23 AM, "PenguinWhispererThe ." <
> th3penguinwhisperer@gmail.com> wrote:
>
>> Does compactionstats shows only stats for completed compactions (100%)?
>> It might be that the compaction is running constantly, over and over aga=
in.
>> In that case I need to know what I might be able to do to stop this
>> constant compaction so I can start a nodetool repair.
>>
>> Note that there is a lot of traffic on this columnfamily so I'm not sure
>> if temporary disabling compaction is an option. The repair will probably
>> take long as well.
>>
>> Sebastian and Rob: do you might have any more ideas about the things I
>> put in this thread? Any help is appreciated!
>>
>> 2015-11-10 20:03 GMT+01:00 PenguinWhispererThe . <
>> th3penguinwhisperer@gmail.com>:
>>
>>> Hi Sebastian,
>>>
>>> Thanks for your response.
>>>
>>> No swap is used. No offense, I just don't see a reason why having swap
>>> would be the issue here. I put swapiness on 1. I also have jna installe=
d.
>>> That should prevent java being swapped out as wel AFAIK.
>>>
>>>
>>> 2015-11-10 19:50 GMT+01:00 Sebastian Estevez <
>>> sebastian.estevez@datastax.com>:
>>>
>>>> Turn off Swap.
>>>>
>>>>
>>>> http://docs.datastax.com/en/cassandra/2.1/cassandra/install/installRec=
ommendSettings.html?scroll=3Dreference_ds_sxl_gf3_2k__disable-swap
>>>>
>>>>
>>>> All the best,
>>>>
>>>>
>>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>>
>>>> Sebasti=C3=A1n Est=C3=A9vez
>>>>
>>>> Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com
>>>>
>>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [ima=
ge:
>>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>>>> <https://twitter.com/datastax> [image: g+.png]
>>>> <https://plus.google.com/+Datastax/about>
>>>> <http://feeds.feedburner.com/datastax>
>>>> <http://goog_410786983>
>>>>
>>>>
>>>> <http://www.datastax.com/gartner-magic-quadrant-odbms>
>>>>
>>>> DataStax is the fastest, most scalable distributed database
>>>> technology, delivering Apache Cassandra to the world=E2=80=99s most in=
novative
>>>> enterprises. Datastax is built to be agile, always-on, and predictably
>>>> scalable to any size. With more than 500 customers in 45 countries, Da=
taStax
>>>> is the database technology and transactional backbone of choice for th=
e
>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, and e=
Bay.
>>>>
>>>> On Tue, Nov 10, 2015 at 1:48 PM, PenguinWhispererThe . <
>>>> th3penguinwhisperer@gmail.com> wrote:
>>>>
>>>>> I also have the following memory usage:
>>>>> [root@US-BILLINGDSX4 cassandra]# free -m
>>>>>              total       used       free     shared    buffers
>>>>> cached
>>>>> Mem:         12024       9455       2569          0        110
>>>>> 2163
>>>>> -/+ buffers/cache:       7180       4844
>>>>> Swap:         2047          0       2047
>>>>>
>>>>> Still a lot free and a lot of free buffers/cache.
>>>>>
>>>>> 2015-11-10 19:45 GMT+01:00 PenguinWhispererThe . <
>>>>> th3penguinwhisperer@gmail.com>:
>>>>>
>>>>>> Still stuck with this. However I enabled GC logging. This shows the
>>>>>> following:
>>>>>>
>>>>>> [root@myhost cassandra]# tail -f gc-1447180680.log
>>>>>> 2015-11-10T18:41:45.516+0000: 225.428: [GC
>>>>>> 2721842K->2066508K(6209536K), 0.0199040 secs]
>>>>>> 2015-11-10T18:41:45.977+0000: 225.889: [GC
>>>>>> 2721868K->2066511K(6209536K), 0.0221910 secs]
>>>>>> 2015-11-10T18:41:46.437+0000: 226.349: [GC
>>>>>> 2721871K->2066524K(6209536K), 0.0222140 secs]
>>>>>> 2015-11-10T18:41:46.897+0000: 226.809: [GC
>>>>>> 2721884K->2066539K(6209536K), 0.0224140 secs]
>>>>>> 2015-11-10T18:41:47.359+0000: 227.271: [GC
>>>>>> 2721899K->2066538K(6209536K), 0.0302520 secs]
>>>>>> 2015-11-10T18:41:47.821+0000: 227.733: [GC
>>>>>> 2721898K->2066557K(6209536K), 0.0280530 secs]
>>>>>> 2015-11-10T18:41:48.293+0000: 228.205: [GC
>>>>>> 2721917K->2066571K(6209536K), 0.0218000 secs]
>>>>>> 2015-11-10T18:41:48.790+0000: 228.702: [GC
>>>>>> 2721931K->2066780K(6209536K), 0.0292470 secs]
>>>>>> 2015-11-10T18:41:49.290+0000: 229.202: [GC
>>>>>> 2722140K->2066843K(6209536K), 0.0288740 secs]
>>>>>> 2015-11-10T18:41:49.756+0000: 229.668: [GC
>>>>>> 2722203K->2066818K(6209536K), 0.0283380 secs]
>>>>>> 2015-11-10T18:41:50.249+0000: 230.161: [GC
>>>>>> 2722178K->2067158K(6209536K), 0.0218690 secs]
>>>>>> 2015-11-10T18:41:50.713+0000: 230.625: [GC
>>>>>> 2722518K->2067236K(6209536K), 0.0278810 secs]
>>>>>>
>>>>>> This is a VM with 12GB of RAM. Highered the HEAP_SIZE to 6GB and
>>>>>> HEAP_NEWSIZE to 800MB.
>>>>>>
>>>>>> Still the same result.
>>>>>>
>>>>>> This looks very similar to following issue:
>>>>>>
>>>>>> http://mail-archives.apache.org/mod_mbox/cassandra-user/201411.mbox/=
%3CCAJ=3D3xgRLsvpnZe0uXEYjG94rKhfXeU+jBR=3DQ3A-_C3rsdD5kug@mail.gmail.com%3=
E
>>>>>>
>>>>>> Is the only possibility to upgrade memory? I mean, I can't believe
>>>>>> it's just loading all it's data in memory. That would require to kee=
p
>>>>>> scaling up the node to keep it work?
>>>>>>
>>>>>>
>>>>>> 2015-11-10 9:36 GMT+01:00 PenguinWhispererThe . <
>>>>>> th3penguinwhisperer@gmail.com>:
>>>>>>
>>>>>>> Correction...
>>>>>>> I was grepping on Segmentation on the strace and it happens a lot.
>>>>>>>
>>>>>>> Do I need to run a scrub?
>>>>>>>
>>>>>>> 2015-11-10 9:30 GMT+01:00 PenguinWhispererThe . <
>>>>>>> th3penguinwhisperer@gmail.com>:
>>>>>>>
>>>>>>>> Hi Rob,
>>>>>>>>
>>>>>>>> Thanks for your reply.
>>>>>>>>
>>>>>>>> 2015-11-09 23:17 GMT+01:00 Robert Coli <rcoli@eventbrite.com>:
>>>>>>>>
>>>>>>>>> On Mon, Nov 9, 2015 at 1:29 PM, PenguinWhispererThe . <
>>>>>>>>> th3penguinwhisperer@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> In Opscenter I see one of the nodes is orange. It seems like it'=
s
>>>>>>>>>> working on compaction. I used nodetool compactionstats and whene=
ver I did
>>>>>>>>>> this the Completed nad percentage stays the same (even with hour=
s in
>>>>>>>>>> between).
>>>>>>>>>>
>>>>>>>>> Are you the same person from IRC, or a second report today of
>>>>>>>>> compaction hanging in this way?
>>>>>>>>>
>>>>>>>> Same person ;) Just didn't had things to work with from the chat
>>>>>>>> there. I want to understand the issue more, see what I can tune or=
 fix. I
>>>>>>>> want to do nodetool repair before upgrading to 2.1.11 but the comp=
action is
>>>>>>>> blocking it.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> What version of Cassandra?
>>>>>>>>>
>>>>>>>> 2.0.9
>>>>>>>>
>>>>>>>>> I currently don't see cpu load from cassandra on that node. So it
>>>>>>>>>> seems stuck (somewhere mid 60%). Also some other nodes have comp=
action on
>>>>>>>>>> the same columnfamily. I don't see any progress.
>>>>>>>>>>
>>>>>>>>>>  WARN [RMI TCP Connection(554)-192.168.0.68] 2015-11-09 17:18:13=
,677 ColumnFamilyStore.java (line 2101) Unable to cancel in-progress compac=
tions for usage_record_ptd.  Probably there is an unusually large row in pr=
ogress somewhere.  It is also possible that buggy code left some sstables c=
ompacting after it was done with them
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    - How can I assure that nothing is happening?
>>>>>>>>>>
>>>>>>>>>> Find the thread that is doing compaction and strace it. Generall=
y
>>>>>>>>> it is one of the threads with a lower thread priority.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I have 141 threads. Not sure if that's normal.
>>>>>>>>
>>>>>>>> This seems to be the one:
>>>>>>>>  61404 cassandr  24   4 8948m 4.3g 820m R 90.2 36.8 292:54.47 java
>>>>>>>>
>>>>>>>> In the strace I see basically this part repeating (with once in a
>>>>>>>> while the "resource temporarily unavailable"):
>>>>>>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
>>>>>>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) =3D 1
>>>>>>>> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) =3D 1
>>>>>>>> getpriority(PRIO_PROCESS, 61404)        =3D 16
>>>>>>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
>>>>>>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) =3D 1
>>>>>>>> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) =3D 0
>>>>>>>> futex(0x1233854, FUTEX_WAIT_PRIVATE, 494045, NULL) =3D -1 EAGAIN
>>>>>>>> (Resource temporarily unavailable)
>>>>>>>> futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) =3D 0
>>>>>>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
>>>>>>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) =3D 1
>>>>>>>> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) =3D 1
>>>>>>>> futex(0x1233854, FUTEX_WAIT_PRIVATE, 494047, NULL) =3D 0
>>>>>>>> futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) =3D 0
>>>>>>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
>>>>>>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) =3D 1
>>>>>>>> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) =3D 1
>>>>>>>> getpriority(PRIO_PROCESS, 61404)        =3D 16
>>>>>>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
>>>>>>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) =3D 1
>>>>>>>> futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) =3D 1
>>>>>>>> futex(0x1233854, FUTEX_WAIT_PRIVATE, 494049, NULL) =3D 0
>>>>>>>> futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) =3D 0
>>>>>>>> getpriority(PRIO_PROCESS, 61404)        =3D 16
>>>>>>>>
>>>>>>>> But wait!
>>>>>>>> I also see this:
>>>>>>>> futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50,
>>>>>>>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) =3D 1
>>>>>>>> futex(0x1233854, FUTEX_WAIT_PRIVATE, 494055, NULL) =3D 0
>>>>>>>> futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) =3D 0
>>>>>>>> --- SIGSEGV (Segmentation fault) @ 0 (0) ---
>>>>>>>>
>>>>>>>> This doesn't seem to happen that often though.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Compaction often appears hung when decompressing a very large row=
,
>>>>>>>>> but usually not for "hours".
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    - Is it recommended to disable compaction from a certain data
>>>>>>>>>>    size? (I believe 25GB on each node).
>>>>>>>>>>
>>>>>>>>>> It is almost never recommended to disable compaction.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    - Can I stop this compaction? nodetool stop compaction
>>>>>>>>>>    doesn't seem to work.
>>>>>>>>>>
>>>>>>>>>> Killing the JVM ("the dungeon collapses!") would certainly stop
>>>>>>>>> it, but it'd likely just start again when you restart the node.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    - Is stopping the compaction dangerous?
>>>>>>>>>>
>>>>>>>>>>  Not if you're in a version that properly cleans up partial
>>>>>>>>> compactions, which is most of them.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    - Is killing the cassandra process dangerous while
>>>>>>>>>>    compacting(I did nodetool drain on one node)?
>>>>>>>>>>
>>>>>>>>>> No. But probably nodetool drain couldn't actually stop the
>>>>>>>>> in-progress compaction either, FWIW.
>>>>>>>>>
>>>>>>>>>> This is output of nodetool compactionstats grepped for the
>>>>>>>>>> keyspace that seems stuck.
>>>>>>>>>>
>>>>>>>>>> Do you have gigantic rows in that keyspace? What does cfstats sa=
y
>>>>>>>>> about the largest row compaction has seen/do you have log message=
s about
>>>>>>>>> compacting large rows?
>>>>>>>>>
>>>>>>>>
>>>>>>>> I don't know about the gigantic rows. How can I check?
>>>>>>>>
>>>>>>>> I've checked the logs and found this:
>>>>>>>>  INFO [CompactionExecutor:67] 2015-11-10 02:34:19,077
>>>>>>>> CompactionController.java (line 192) Compacting large row
>>>>>>>> billing/usage_record_ptd:177727:2015-10-14 00\:00Z (243992466 byte=
s)
>>>>>>>> incrementally
>>>>>>>> So this is from 6 hours ago.
>>>>>>>>
>>>>>>>> I also see a lot of messages like this:
>>>>>>>> INFO [OptionalTasks:1] 2015-11-10 06:36:06,395 MeteredFlusher.java
>>>>>>>> (line 58) flushing high-traffic column family CFS(Keyspace=3D'myke=
yspace',
>>>>>>>> ColumnFamily=3D'mycolumnfamily') (estimated 100317609 bytes)
>>>>>>>> And (although it's unrelated this might impact compaction
>>>>>>>> performance?):
>>>>>>>>  WARN [Native-Transport-Requests:10514] 2015-11-10 06:33:34,172
>>>>>>>> BatchStatement.java (line 223) Batch of prepared statements for
>>>>>>>> [billing.usage_record_ptd] is of size 13834, exceeding specified t=
hreshold
>>>>>>>> of 5120 by 8714.
>>>>>>>>
>>>>>>>> It's like the compaction is only doing one sstable at a time and i=
s
>>>>>>>> doing nothing a long time in between.
>>>>>>>>
>>>>>>>> cfstats for this keyspace and columnfamily gives the following:
>>>>>>>>                 Table: mycolumnfamily
>>>>>>>>                 SSTable count: 26
>>>>>>>>                 Space used (live), bytes: 319858991
>>>>>>>>                 Space used (total), bytes: 319860267
>>>>>>>>                 SSTable Compression Ratio: 0.24265700071674673
>>>>>>>>                 Number of keys (estimate): 6656
>>>>>>>>                 Memtable cell count: 22710
>>>>>>>>                 Memtable data size, bytes: 3310654
>>>>>>>>                 Memtable switch count: 31
>>>>>>>>                 Local read count: 0
>>>>>>>>                 Local read latency: 0.000 ms
>>>>>>>>                 Local write count: 997667
>>>>>>>>                 Local write latency: 0.000 ms
>>>>>>>>                 Pending tasks: 0
>>>>>>>>                 Bloom filter false positives: 0
>>>>>>>>                 Bloom filter false ratio: 0.00000
>>>>>>>>                 Bloom filter space used, bytes: 12760
>>>>>>>>                 Compacted partition minimum bytes: 1332
>>>>>>>>                 Compacted partition maximum bytes: 43388628
>>>>>>>>                 Compacted partition mean bytes: 234682
>>>>>>>>                 Average live cells per slice (last five minutes):
>>>>>>>> 0.0
>>>>>>>>                 Average tombstones per slice (last five minutes):
>>>>>>>> 0.0
>>>>>>>>
>>>>>>>>
>>>>>>>>> I also see frequently lines like this in system.log:
>>>>>>>>>>
>>>>>>>>>> WARN [Native-Transport-Requests:11935] 2015-11-09 20:10:41,886 B=
atchStatement.java (line 223) Batch of prepared statements for [billing.usa=
ge_record_by_billing_period, billing.metric] is of size 53086, exceeding sp=
ecified threshold of 5120 by 47966.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Unrelated.
>>>>>>>>>
>>>>>>>>> =3DRob
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Can I upgrade to 2.1.11 without doing a nodetool repair/compaction
>>>>>>>> being stuck?
>>>>>>>> Another thing to mention is that nodetool repair didn't run yet. I=
t
>>>>>>>> got installed but nobody bothered to schedule the repair.
>>>>>>>>
>>>>>>>> Thanks for looking into this!
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

--047d7b6700cfed51ca0525e7a1d2
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div><div><div>So it seems I found the problem.<br><b=
r></div>The node opening a stream is waiting for the other node to respond =
but that node never responds due to a broken pipe which makes Cassandra wai=
t forever.<br><br></div>It&#39;s basically this issue: <a href=3D"https://i=
ssues.apache.org/jira/browse/CASSANDRA-8472">https://issues.apache.org/jira=
/browse/CASSANDRA-8472</a><br></div>And this is the workaround/fix: <a href=
=3D"https://issues.apache.org/jira/browse/CASSANDRA-8611">https://issues.ap=
ache.org/jira/browse/CASSANDRA-8611</a><br><br></div><div>So:<br></div><div=
>- update cassandra to &gt;=3D2.0.11<br></div><div>- add option streaming_s=
ocket_timeout_in_ms =3D 10000<br></div><div>- do rolling restart of cassand=
ra<br></div><div><br></div><div>What&#39;s weird is that the IOException: B=
roken pipe is never shown in my logs (not on any node). And my logging is s=
et to INFO in log4j config.<br></div><div>I have this config in log4j-serve=
r.properties:<br><div class=3D""># output messages into a rolling log file =
as well as stdout</div>
<div class=3D"">log4j.rootLogger=3DINFO,stdout,R</div>
<div class=3D"">=C2=A0</div>
<div class=3D""># stdout</div>
<div class=3D"">log4j.appender.stdout=3Dorg.apache.log4j.ConsoleAppender</d=
iv>
<div class=3D"">log4j.appender.stdout.layout=3Dorg.apache.log4j.PatternLayo=
ut</div>
<div class=3D"">log4j.appender.stdout.layout.ConversionPattern=3D%5p %d{HH:=
mm:ss,SSS} %m%n</div>
<div class=3D"">=C2=A0</div>
<div class=3D""># rolling log file</div>
<div class=3D"">log4j.appender.R=3Dorg.apache.log4j.RollingFileAppender</di=
v>
<div class=3D"">log4j.appender.R.maxFileSize=3D20MB</div>
<div class=3D"">log4j.appender.R.maxBackupIndex=3D50</div>
<div class=3D"">log4j.appender.R.layout=3Dorg.apache.log4j.PatternLayout</d=
iv>
<div class=3D"">log4j.appender.R.layout.ConversionPattern=3D%5p [%t] %d{ISO=
8601} %F (line %L) %m%n</div>
<div class=3D""># Edit the next line to point to your logs directory</div>
<div class=3D"">log4j.appender.R.File=3D/var/log/cassandra/system.log</div>
<div class=3D"">=C2=A0</div>
<div class=3D""># Application logging options</div>
<div class=3D"">#log4j.logger.org.apache.cassandra=3DDEBUG</div>
<div class=3D"">#log4j.logger.org.apache.cassandra.db=3DDEBUG</div>
<div class=3D"">#log4j.logger.org.apache.cassandra.service.StorageProxy=3DD=
EBUG</div>
<div class=3D"">=C2=A0</div>
<div class=3D""># Adding this to avoid thrift logging disconnect errors.</d=
iv>
log4j.logger.org.apache.thrift.server.TNonblockingServer=3DERROR<br></div><=
div><br></div>Too bad nobody else could point to those. Hope it helps someo=
ne else from wasting a lot of time.<br></div><div class=3D"gmail_extra"><br=
><div class=3D"gmail_quote">2015-11-11 15:42 GMT+01:00 Sebastian Estevez <s=
pan dir=3D"ltr">&lt;<a href=3D"mailto:sebastian.estevez@datastax.com" targe=
t=3D"_blank">sebastian.estevez@datastax.com</a>&gt;</span>:<br><blockquote =
class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid=
;padding-left:1ex"><p dir=3D"ltr">Use &#39;nodetool compactionhistory&#39;<=
/p>
<p dir=3D"ltr">all the best,</p>
<p dir=3D"ltr">Sebasti=C3=A1n</p><div class=3D"HOEnZb"><div class=3D"h5">
<div class=3D"gmail_quote">On Nov 11, 2015 3:23 AM, &quot;PenguinWhispererT=
he .&quot; &lt;<a href=3D"mailto:th3penguinwhisperer@gmail.com" target=3D"_=
blank">th3penguinwhisperer@gmail.com</a>&gt; wrote:<br type=3D"attribution"=
><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1=
px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><div><div>Does compac=
tionstats shows only stats for completed compactions (100%)? It might be th=
at the compaction is running constantly, over and over again.<br></div>In t=
hat case I need to know what I might be able to do to stop this constant co=
mpaction so I can start a nodetool repair.<br><br></div>Note that there is =
a lot of traffic on this columnfamily so I&#39;m not sure if temporary disa=
bling compaction is an option. The repair will probably take long as well.<=
br><br></div>Sebastian and Rob: do you might have any more ideas about the =
things I put in this thread? Any help is appreciated!<br></div><div class=
=3D"gmail_extra"><br><div class=3D"gmail_quote">2015-11-10 20:03 GMT+01:00 =
PenguinWhispererThe . <span dir=3D"ltr">&lt;<a href=3D"mailto:th3penguinwhi=
sperer@gmail.com" target=3D"_blank">th3penguinwhisperer@gmail.com</a>&gt;</=
span>:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bord=
er-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><div><div>Hi=
 Sebastian,<br><br></div>Thanks for your response.<br><br></div>No swap is =
used. No offense, I just don&#39;t see a reason why having swap would be th=
e issue here. I put swapiness on 1. I also have jna installed. That should =
prevent java being swapped out as wel AFAIK.<br></div><br></div><div><div><=
div class=3D"gmail_extra"><br><div class=3D"gmail_quote">2015-11-10 19:50 G=
MT+01:00 Sebastian Estevez <span dir=3D"ltr">&lt;<a href=3D"mailto:sebastia=
n.estevez@datastax.com" target=3D"_blank">sebastian.estevez@datastax.com</a=
>&gt;</span>:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8=
ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">Turn off S=
wap.<div><br></div><div><a href=3D"http://docs.datastax.com/en/cassandra/2.=
1/cassandra/install/installRecommendSettings.html?scroll=3Dreference_ds_sxl=
_gf3_2k__disable-swap" target=3D"_blank">http://docs.datastax.com/en/cassan=
dra/2.1/cassandra/install/installRecommendSettings.html?scroll=3Dreference_=
ds_sxl_gf3_2k__disable-swap</a><br></div><div><br></div></div><div class=3D=
"gmail_extra"><br clear=3D"all"><div><div><div dir=3D"ltr"><div><div dir=3D=
"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr"><span><p style=3D"line-he=
ight:1.15;margin-top:0pt;margin-bottom:0pt">All the best,</p><p dir=3D"ltr"=
 style=3D"line-height:1.15;margin-top:0pt;margin-bottom:0pt"><br></p><p dir=
=3D"ltr" style=3D"line-height:1.15;margin-top:0pt;margin-bottom:0pt"><span =
style=3D"text-decoration:underline;font-size:12px;font-family:Arial;color:r=
gb(17,85,204);vertical-align:baseline;white-space:pre-wrap"><a href=3D"http=
://www.datastax.com/" style=3D"text-decoration:none" target=3D"_blank"><img=
 src=3D"https://lh3.googleusercontent.com/pVhGeSH7Pht91xjoj4-LSCmsBLEJnOtq9=
c52j-z5RQD-I_vqkjnlxxkkHZZPQYi-2xgAropKv0UMqXxu24XYSkSXg-WQ82UB2nZ2DHu9yusG=
97HKdKzgJcRg55Lxinrnkw" style=3D"border:none" alt=3D"datastax_logo.png" hei=
ght=3D"39px;" width=3D"187px;"></a></span></p><p dir=3D"ltr" style=3D"line-=
height:1.15;margin-top:0pt;margin-bottom:0pt"><span style=3D"font-size:15px=
;font-family:Calibri;color:rgb(0,0,0);vertical-align:baseline;white-space:p=
re-wrap;background-color:transparent">Sebasti=C3=A1n Est=C3=A9vez</span></p=
><p dir=3D"ltr" style=3D"line-height:1.15;margin-top:0pt;margin-bottom:0pt"=
><span style=3D"font-size:15px;font-family:Calibri;color:rgb(0,0,0);vertica=
l-align:baseline;white-space:pre-wrap;background-color:transparent">Solutio=
ns Architect |</span><span style=3D"font-size:15px;font-family:Calibri;colo=
r:rgb(0,0,0);font-weight:bold;vertical-align:baseline;white-space:pre-wrap;=
background-color:transparent"> </span><span style=3D"font-size:15px;font-fa=
mily:Calibri;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;=
background-color:transparent"><a href=3D"tel:954%20905%208615" value=3D"+19=
549058615" target=3D"_blank">954 905 8615</a> | <a href=3D"mailto:sebastian=
.estevez@datastax.com" target=3D"_blank">sebastian.estevez@datastax.com</a>=
</span></p><p dir=3D"ltr" style=3D"line-height:1.15;margin-top:0pt;margin-b=
ottom:0pt"><a href=3D"https://www.linkedin.com/company/datastax" style=3D"c=
olor:rgb(17,85,204);font-size:12.8000001907349px;line-height:11.77600002288=
82px;text-decoration:none" target=3D"_blank"><span style=3D"font-size:15px;=
font-family:Calibri;text-decoration:underline;vertical-align:baseline;white=
-space:pre-wrap;background-color:transparent"><img src=3D"https://lh3.googl=
eusercontent.com/mtwNeSEAXaqeWwFu3bQmYfrSh4u1-RklZGXi_qeKa_xk1aGiVTDY4D8dFB=
MmJDRTR8G5E3C1rQhSsvh5-qsgxDJn0EnyB7QA4ymlNcjE-aZ2Bs5j4Azw6SAzFeGhlouE9w" a=
lt=3D"linkedin.png" style=3D"border:none" height=3D"27px;" width=3D"27px;">=
</span></a><span style=3D"font-size:15px;font-family:Arial;color:rgb(0,0,0)=
;vertical-align:baseline;white-space:pre-wrap;background-color:transparent"=
> </span><a href=3D"https://www.facebook.com/datastax" style=3D"color:rgb(1=
7,85,204);font-size:12.8000001907349px;line-height:11.7760000228882px;text-=
decoration:none" target=3D"_blank"><span style=3D"font-size:15px;font-famil=
y:Calibri;text-decoration:underline;vertical-align:baseline;white-space:pre=
-wrap;background-color:transparent"><img src=3D"https://lh4.googleuserconte=
nt.com/y-b0_-GHQA7rCMcAmCgyhwfRCnHWNV996qbYdUM4zrr5rWr7drbMmqcaE6cLuv_QplGM=
p30Z5IMZ1N3ScZqwbNcL91BXnUiXgjeuO1XcEq9v45v65i0svNGk1srzYkmMlQ" alt=3D"face=
book.png" style=3D"border:none" height=3D"27px;" width=3D"27px;"></span></a=
><span style=3D"font-size:15px;font-family:Calibri;color:rgb(102,102,102);v=
ertical-align:baseline;white-space:pre-wrap;background-color:transparent"> =
</span><a href=3D"https://twitter.com/datastax" style=3D"color:rgb(17,85,20=
4);font-size:12.8000001907349px;line-height:11.7760000228882px;text-decorat=
ion:none" target=3D"_blank"><span style=3D"font-size:15px;font-family:Calib=
ri;text-decoration:underline;vertical-align:baseline;white-space:pre-wrap;b=
ackground-color:transparent"><img src=3D"https://lh4.googleusercontent.com/=
ZdAbTYu8I6ebOtAT1Umh9JHlULX4st8OlFMMNZr_YoF4C_94k_vziIHnUs1I9csY57-RUoeQyBP=
jbkGg3RTwM9QBVSh_aojEjvg3iyxZRHxvyPyXs_wScfyz3x3R8BVlMQ" alt=3D"twitter.png=
" style=3D"border:none" height=3D"27px;" width=3D"27px;"></span></a><span s=
tyle=3D"font-size:15px;font-family:Calibri;color:rgb(102,102,102);vertical-=
align:baseline;white-space:pre-wrap;background-color:transparent"> </span><=
a href=3D"https://plus.google.com/+Datastax/about" style=3D"color:rgb(17,85=
,204);font-size:12.8000001907349px;line-height:11.7760000228882px;text-deco=
ration:none" target=3D"_blank"><span style=3D"font-size:15px;font-family:Ca=
libri;text-decoration:underline;vertical-align:baseline;white-space:pre-wra=
p;background-color:transparent"><img src=3D"https://lh6.googleusercontent.c=
om/gcFd7WMLL8mrrumsfosMiEjhDw29KePMjKcs-2BKezcUcvnuNWeqgZiig9OMStR6yt3e1KqZ=
rJ_KDHnsgq_cTpjfjniP_ZzgT1ISGs1Dr7S2hGgfDbw9f7npg_F3IvxCNw" alt=3D"g+.png" =
style=3D"border:none" height=3D"27px;" width=3D"27px;"></span></a><span sty=
le=3D"font-size:15px;font-family:Calibri;color:rgb(102,102,102);vertical-al=
ign:baseline;white-space:pre-wrap;background-color:transparent"> </span><sp=
an style=3D"color:rgb(17,85,204);font-size:15px;line-height:11.776px;text-d=
ecoration:underline;font-family:Calibri;vertical-align:baseline;white-space=
:pre-wrap;background-color:transparent"><a href=3D"http://feeds.feedburner.=
com/datastax" style=3D"color:rgb(17,85,204);font-size:12.8000001907349px;li=
ne-height:11.7760000228882px;text-decoration:none" target=3D"_blank"><img s=
rc=3D"https://lh6.googleusercontent.com/24_538J0j5M0NHQx-jkRiV_IHrhsh-98hpi=
--Qz9b0-I4llvWuYI6LgiVJsul0AhxL0gMTOHgw3G0SvIXaT2C7fsKKa_DdQ2uOJ-bQ6h_mQ7k7=
iMybcR1dr1VhWgLMxcmg" style=3D"border:none" height=3D"27px;" width=3D"27px;=
"></a></span><a href=3D"http://goog_410786983" target=3D"_blank"><br></a></=
p><p dir=3D"ltr" style=3D"line-height:1.15;margin-top:0pt;margin-bottom:0pt=
"><br></p><p dir=3D"ltr" style=3D"line-height:1.15;margin-top:0pt;margin-bo=
ttom:0pt"><a href=3D"http://www.datastax.com/gartner-magic-quadrant-odbms" =
target=3D"_blank"><img src=3D"http://learn.datastax.com/rs/059-YLZ-577/imag=
es/Gartner_728x90_Sig4.png" alt=3D""></a></p></span></div><div dir=3D"ltr">=
<br></div><div dir=3D"ltr">Da<span style=3D"font-size:12.8000001907349px"><=
span style=3D"font-size:12px;font-family:Arial;color:rgb(0,0,0);vertical-al=
ign:baseline;white-space:pre-wrap">taStax is the </span></span><span style=
=3D"font-size:12.8000001907349px"><span style=3D"font-size:12px;font-family=
:Arial;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap">faste=
st, mo</span></span><span style=3D"font-size:12px;font-family:Arial;color:r=
gb(0,0,0);vertical-align:baseline;white-space:pre-wrap">st scalable distrib=
uted database technology, delivering Apache Cassandra to the world=E2=80=99=
s most innovative enterprises. Datastax is built to be agile, always-on, an=
d predictably scalable to any size. With more than 500 customers in 45 coun=
tries, </span><span style=3D"font-size:12px;font-family:Arial;vertical-alig=
n:baseline;white-space:pre-wrap">DataStax is the database technology and tr=
ansactional backbone of choice for the worlds most innovative companies suc=
h as Netflix, Adobe, Intuit, and eBay.</span><span style=3D"font-size:12px;=
font-family:Arial;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-=
wrap"> </span></div></div></div></div></div></div></div></div></div><div><d=
iv>
<br><div class=3D"gmail_quote">On Tue, Nov 10, 2015 at 1:48 PM, PenguinWhis=
pererThe . <span dir=3D"ltr">&lt;<a href=3D"mailto:th3penguinwhisperer@gmai=
l.com" target=3D"_blank">th3penguinwhisperer@gmail.com</a>&gt;</span> wrote=
:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-le=
ft:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>I also have the f=
ollowing memory usage:<br>[root@US-BILLINGDSX4 cassandra]# free -m<br>=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 total=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 used=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0 free=C2=A0=C2=A0=C2=A0=C2=A0 shared=C2=A0=C2=A0=C2=A0 buffers=C2=A0=C2=
=A0=C2=A0=C2=A0 cached<br>Mem:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0 12024=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 9455=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0 2569=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 110=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0 2163<br>-/+ buffers/cache:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 71=
80=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 4844<br>Swap:=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0 2047=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 2047<br><br></div>Still =
a lot free and a lot of free buffers/cache.<br></div><div><div><div class=
=3D"gmail_extra"><br><div class=3D"gmail_quote">2015-11-10 19:45 GMT+01:00 =
PenguinWhispererThe . <span dir=3D"ltr">&lt;<a href=3D"mailto:th3penguinwhi=
sperer@gmail.com" target=3D"_blank">th3penguinwhisperer@gmail.com</a>&gt;</=
span>:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bord=
er-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><div><div><d=
iv>Still stuck with this. However I enabled GC logging. This shows the foll=
owing:<br><br>[root@myhost cassandra]# tail -f gc-1447180680.log<br>2015-11=
-10T18:41:45.516+0000: 225.428: [GC 2721842K-&gt;2066508K(6209536K), 0.0199=
040 secs]<br>2015-11-10T18:41:45.977+0000: 225.889: [GC 2721868K-&gt;206651=
1K(6209536K), 0.0221910 secs]<br>2015-11-10T18:41:46.437+0000: 226.349: [GC=
 2721871K-&gt;2066524K(6209536K), 0.0222140 secs]<br>2015-11-10T18:41:46.89=
7+0000: 226.809: [GC 2721884K-&gt;2066539K(6209536K), 0.0224140 secs]<br>20=
15-11-10T18:41:47.359+0000: 227.271: [GC 2721899K-&gt;2066538K(6209536K), 0=
.0302520 secs]<br>2015-11-10T18:41:47.821+0000: 227.733: [GC 2721898K-&gt;2=
066557K(6209536K), 0.0280530 secs]<br>2015-11-10T18:41:48.293+0000: 228.205=
: [GC 2721917K-&gt;2066571K(6209536K), 0.0218000 secs]<br>2015-11-10T18:41:=
48.790+0000: 228.702: [GC 2721931K-&gt;2066780K(6209536K), 0.0292470 secs]<=
br>2015-11-10T18:41:49.290+0000: 229.202: [GC 2722140K-&gt;2066843K(6209536=
K), 0.0288740 secs]<br>2015-11-10T18:41:49.756+0000: 229.668: [GC 2722203K-=
&gt;2066818K(6209536K), 0.0283380 secs]<br>2015-11-10T18:41:50.249+0000: 23=
0.161: [GC 2722178K-&gt;2067158K(6209536K), 0.0218690 secs]<br>2015-11-10T1=
8:41:50.713+0000: 230.625: [GC 2722518K-&gt;2067236K(6209536K), 0.0278810 s=
ecs]<br><br></div>This is a VM with 12GB of RAM. Highered the HEAP_SIZE to =
6GB and HEAP_NEWSIZE to 800MB.<br><br></div>Still the same result.<br><br><=
/div>This looks very similar to following issue:<br><a href=3D"http://mail-=
archives.apache.org/mod_mbox/cassandra-user/201411.mbox/%3CCAJ=3D3xgRLsvpnZ=
e0uXEYjG94rKhfXeU+jBR=3DQ3A-_C3rsdD5kug@mail.gmail.com%3E" target=3D"_blank=
">http://mail-archives.apache.org/mod_mbox/cassandra-user/201411.mbox/%3CCA=
J=3D3xgRLsvpnZe0uXEYjG94rKhfXeU+jBR=3DQ3A-_C3rsdD5kug@mail.gmail.com%3E</a>=
<br><br></div>Is the only possibility to upgrade memory? I mean, I can&#39;=
t believe it&#39;s just loading all it&#39;s data in memory. That would req=
uire to keep scaling up the node to keep it work?<br><div><div><div><div><d=
iv><br></div></div></div></div></div></div><div><div><div class=3D"gmail_ex=
tra"><br><div class=3D"gmail_quote">2015-11-10 9:36 GMT+01:00 PenguinWhispe=
rerThe . <span dir=3D"ltr">&lt;<a href=3D"mailto:th3penguinwhisperer@gmail.=
com" target=3D"_blank">th3penguinwhisperer@gmail.com</a>&gt;</span>:<br><bl=
ockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #=
ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><div>Correction...<br></d=
iv>I was grepping on Segmentation on the strace and it happens a lot.<br><b=
r></div>Do I need to run a scrub?<br></div><div><div><div class=3D"gmail_ex=
tra"><br><div class=3D"gmail_quote">2015-11-10 9:30 GMT+01:00 PenguinWhispe=
rerThe . <span dir=3D"ltr">&lt;<a href=3D"mailto:th3penguinwhisperer@gmail.=
com" target=3D"_blank">th3penguinwhisperer@gmail.com</a>&gt;</span>:<br><bl=
ockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #=
ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>Hi Rob,<br><br></div>Than=
ks for your reply.<br><div><div><div class=3D"gmail_extra"><br><div class=
=3D"gmail_quote"><span>2015-11-09 23:17 GMT+01:00 Robert Coli <span dir=3D"=
ltr">&lt;<a href=3D"mailto:rcoli@eventbrite.com" target=3D"_blank">rcoli@ev=
entbrite.com</a>&gt;</span>:<br><blockquote class=3D"gmail_quote" style=3D"=
margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-lef=
t:1ex"><div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quot=
e"><span>On Mon, Nov 9, 2015 at 1:29 PM, PenguinWhispererThe . <span dir=3D=
"ltr">&lt;<a href=3D"mailto:th3penguinwhisperer@gmail.com" target=3D"_blank=
">th3penguinwhisperer@gmail.com</a>&gt;</span> wrote:=C2=A0<blockquote clas=
s=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid r=
gb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div><div class=3D"gmail=
_quote"><div dir=3D"ltr"><div><p>In Opscenter I see one of the nodes is ora=
nge. It seems like it&#39;s=20
working on compaction.
I used nodetool compactionstats and whenever I did this the Completed=20
nad percentage stays the same (even with hours in between).</p></div></div>=
</div></div></div></blockquote></span><div>Are you the same person from IRC=
, or a second report today of compaction hanging in this way?</div></div></=
div></div></blockquote></span><div>Same person ;) Just didn&#39;t had thing=
s to work with from the chat there. I want to understand the issue more, se=
e what I can tune or fix. I want to do nodetool repair before upgrading to =
2.1.11 but the compaction is blocking it.<br></div><blockquote class=3D"gma=
il_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,2=
04,204);padding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_extra"><div =
class=3D"gmail_quote"><div><br></div><div>=C2=A0<br></div></div></div></div=
></blockquote><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px=
 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D=
"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote"><div>What versi=
on of Cassandra?</div></div></div></div></blockquote><div>2.0.9 <br></div><=
span><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bo=
rder-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><di=
v class=3D"gmail_extra"><div class=3D"gmail_quote"><span><blockquote class=
=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rg=
b(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div><div class=3D"gmail_=
quote"><div dir=3D"ltr"><div><p> I currently=20
don&#39;t see cpu load from cassandra on that node.
So it seems stuck (somewhere mid 60%). Also some other nodes have=20
compaction on the same columnfamily.
I don&#39;t see any progress. </p>


<pre><code> WARN [RMI TCP Connection(554)-192.168.0.68] 2015-11-09 17:18:13=
,677 ColumnFamilyStore.java (line 2101) Unable to cancel in-progress compac=
tions for usage_record_ptd.  Probably there is an unusually large row in pr=
ogress somewhere.  It is also possible that buggy code left some sstables c=
ompacting after it was done with them
</code></pre>

<ul><li>How can I assure that nothing is happening?</li></ul></div></div></=
div></div></div></blockquote></span><div>Find the thread that is doing comp=
action and strace it. Generally it is one of the threads with a lower threa=
d priority.=C2=A0<br></div></div></div></div></blockquote></span><div>=C2=
=A0<div>I have 141 threads. Not sure if that&#39;s normal.<br></div><div><b=
r>This seems to be the one:<br>=C2=A061404 cassandr=C2=A0 24=C2=A0=C2=A0 4 =
8948m 4.3g 820m R 90.2 36.8 292:54.47 java<br><br></div><div>In the strace =
I see basically this part repeating (with once in a while the &quot;resourc=
e temporarily unavailable&quot;):<br>futex(0x7f5c64145e54, FUTEX_WAKE_OP_PR=
IVATE, 1, 1, 0x7f5c64145e50, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) =3D 1<b=
r>futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) =3D 1<br>getpriority(PRIO_PR=
OCESS, 61404)=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D 16<br>futex(0x7=
f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50, {FUTEX_OP_SET, 0,=
 FUTEX_OP_CMP_GT, 1}) =3D 1<br>futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1)=
 =3D 0<br>futex(0x1233854, FUTEX_WAIT_PRIVATE, 494045, NULL) =3D -1 EAGAIN =
(Resource temporarily unavailable)<br>futex(0x1233828, FUTEX_WAKE_PRIVATE, =
1) =3D 0<br>futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e=
50, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) =3D 1<br>futex(0x7f5c64145e28, F=
UTEX_WAKE_PRIVATE, 1) =3D 1<br>futex(0x1233854, FUTEX_WAIT_PRIVATE, 494047,=
 NULL) =3D 0<br>futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) =3D 0<br>futex(0x7f=
5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50, {FUTEX_OP_SET, 0, =
FUTEX_OP_CMP_GT, 1}) =3D 1<br>futex(0x7f5c64145e28, FUTEX_WAKE_PRIVATE, 1) =
=3D 1<br>getpriority(PRIO_PROCESS, 61404)=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0 =3D 16<br>futex(0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f=
5c64145e50, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) =3D 1<br>futex(0x7f5c641=
45e28, FUTEX_WAKE_PRIVATE, 1) =3D 1<br>futex(0x1233854, FUTEX_WAIT_PRIVATE,=
 494049, NULL) =3D 0<br>futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) =3D 0<br>ge=
tpriority(PRIO_PROCESS, 61404)=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =
=3D 16<br><br></div><div>But wait!<br></div><div>I also see this:<br>futex(=
0x7f5c64145e54, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f5c64145e50, {FUTEX_OP_SET,=
 0, FUTEX_OP_CMP_GT, 1}) =3D 1<br>futex(0x1233854, FUTEX_WAIT_PRIVATE, 4940=
55, NULL) =3D 0<br>futex(0x1233828, FUTEX_WAKE_PRIVATE, 1) =3D 0<br>--- SIG=
SEGV (Segmentation fault) @ 0 (0) ---<br><br></div><div>This doesn&#39;t se=
em to happen that often though.<br></div></div><span><blockquote class=3D"g=
mail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204=
,204,204);padding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_extra"><di=
v class=3D"gmail_quote"><div></div><div><br></div><div>Compaction often app=
ears hung when decompressing a very large row, but usually not for &quot;ho=
urs&quot;.</div><span><blockquote class=3D"gmail_quote" style=3D"margin:0px=
 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><di=
v dir=3D"ltr"><div><div class=3D"gmail_quote"><div dir=3D"ltr"><div><ul><li=
>Is it recommended to disable compaction from a certain data size? (I belie=
ve 25GB on each node).</li></ul></div></div></div></div></div></blockquote>=
</span><div>It is almost never recommended to disable compaction.</div><spa=
n><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;borde=
r-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div><=
div class=3D"gmail_quote"><div dir=3D"ltr"><div><ul><li>Can I stop this com=
paction? nodetool stop compaction doesn&#39;t seem to work.</li></ul></div>=
</div></div></div></div></blockquote></span><div>Killing the JVM (&quot;the=
 dungeon collapses!&quot;) would certainly stop it, but it&#39;d likely jus=
t start again when you restart the node.=C2=A0</div><span><blockquote class=
=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rg=
b(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div><div class=3D"gmail_=
quote"><div dir=3D"ltr"><div><ul><li>Is stopping the compaction dangerous?<=
/li></ul></div></div></div></div></div></blockquote></span><div>=C2=A0Not i=
f you&#39;re in a version that properly cleans up partial compactions, whic=
h is most of them.</div><span><blockquote class=3D"gmail_quote" style=3D"ma=
rgin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:=
1ex"><div dir=3D"ltr"><div><div class=3D"gmail_quote"><div dir=3D"ltr"><div=
><ul><li>Is killing the cassandra process dangerous while compacting(I did =
nodetool drain on one node)?</li></ul></div></div></div></div></div></block=
quote></span><div>No. But probably nodetool drain couldn&#39;t actually sto=
p the in-progress compaction either, FWIW.</div><span><blockquote class=3D"=
gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(20=
4,204,204);padding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_quote"><d=
iv dir=3D"ltr">

<p>This is output of nodetool compactionstats grepped for the keyspace that=
 seems stuck.</p>

<pre><span style=3D"font-family:arial,sans-serif"></span></pre></div></div>=
</div></blockquote></span><div>Do you have gigantic rows in that keyspace? =
What does cfstats say about the largest row compaction has seen/do you have=
 log messages about compacting large rows?</div></div></div></div></blockqu=
ote><div><br></div></span><div>I don&#39;t know about the gigantic rows. Ho=
w can I check?<br><br></div><div>I&#39;ve checked the logs and found this:<=
br>=C2=A0INFO [CompactionExecutor:67] 2015-11-10 02:34:19,077 CompactionCon=
troller.java (line 192) Compacting large row billing/usage_record_ptd:17772=
7:2015-10-14 00\:00Z (243992466 bytes) incrementally<br></div><div>So this =
is from 6 hours ago.<br></div><div><br></div><div>I also see a lot of messa=
ges like this:<br>INFO [OptionalTasks:1] 2015-11-10 06:36:06,395 MeteredFlu=
sher.java (line 58) flushing high-traffic column family CFS(Keyspace=3D&#39=
;mykeyspace&#39;, ColumnFamily=3D&#39;mycolumnfamily&#39;) (estimated 10031=
7609 bytes)<br></div><div>And (although it&#39;s unrelated this might impac=
t compaction performance?):<br>=C2=A0WARN [Native-Transport-Requests:10514]=
 2015-11-10 06:33:34,172 BatchStatement.java (line 223) Batch of prepared s=
tatements for [billing.usage_record_ptd] is of size 13834, exceeding specif=
ied threshold of 5120 by 8714.<br><br></div><div>It&#39;s like the compacti=
on is only doing one sstable at a time and is doing nothing a long time in =
between.<br></div><div><br></div><div>cfstats for this keyspace and columnf=
amily gives the following:<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Table: mycolumnfamily<br>=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0 SSTable count: 26<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Space used (live), byte=
s: 319858991<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Space used (total), bytes: 319860267<br>=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0 SSTable Compression Ratio: 0.24265700071674673<br>=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0 Number of keys (estimate): 6656<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Memtable cell cou=
nt: 22710<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Memtable data size, bytes: 3310654<br>=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0 Memtable switch count: 31<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Local read count: 0<=
br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0 Local read latency: 0.000 ms<br>=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Local wr=
ite count: 997667<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Local write latency: 0.000 ms<br>=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0 Pending tasks: 0<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Bloom filter false positiv=
es: 0<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0 Bloom filter false ratio: 0.00000<br>=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0 Bloom filter space used, bytes: 12760<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Compacted part=
ition minimum bytes: 1332<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Compacted partition maximum b=
ytes: 43388628<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Compacted partition mean bytes: 234682<br=
>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0 Average live cells per slice (last five minutes): 0.0<br=
>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0 Average tombstones per slice (last five minutes): 0.0<br=
>=C2=A0<br></div><span><blockquote class=3D"gmail_quote" style=3D"margin:0p=
x 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><d=
iv dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote"><span>=
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div cla=
ss=3D"gmail_quote"><div dir=3D"ltr"><pre><span style=3D"font-family:arial,s=
ans-serif">I also see frequently lines like this in system.log:</span><br><=
/pre>

<pre><code>WARN [Native-Transport-Requests:11935] 2015-11-09 20:10:41,886 B=
atchStatement.java (line 223) Batch of prepared statements for [billing.usa=
ge_record_by_billing_period, billing.metric] is of size 53086, exceeding sp=
ecified threshold of 5120 by 47966.<br></code></pre></div></div></div></blo=
ckquote><div><br></div></span><div>Unrelated.</div><div><br></div><div>=3DR=
ob</div><div>=C2=A0</div></div></div></div>
</blockquote></span></div><br></div><div class=3D"gmail_extra">Can I upgrad=
e to 2.1.11 without doing a nodetool repair/compaction being stuck?<br></di=
v><div class=3D"gmail_extra">Another thing to mention is that nodetool repa=
ir didn&#39;t run yet. It got installed but nobody bothered to schedule the=
 repair.<br></div><div class=3D"gmail_extra"><br></div><div class=3D"gmail_=
extra">Thanks for looking into this!<br></div></div></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</blockquote></div>
</div></div></blockquote></div><br></div>

--047d7b6700cfed51ca0525e7a1d2--