Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
MIME-Version: 1.0
In-Reply-To: <CACcCbPA4kGJJmEeVfxi=jX_YNXKvHN4NxZ8yN-iucmqLsBHagg@mail.gmail.com>
References: <CACcCbPDBDJtXOJ9HK8C+E8Js+yvjontGwkv89Vg654K51C3z5g@mail.gmail.com>
 <CALcD3Ps-yGP3xbubOBc0yK2fTyECZK5itJAeO+Bk2s86mFaQkw@mail.gmail.com>
 <CACcCbPD=2BBedC6-MK0WXonZ0GHRwyiM3gjMKEskkoHiFv5=2A@mail.gmail.com>
 <CACcCbPCwv4w+AxaLMh962N7ewdcWXqA9=zCkfSXXCcsBd_s8Wg@mail.gmail.com>
 <CAE1LR78XLikGPKOLFnX4EGsx1zOQA4_9hbjdi6S-jiymG39Yfg@mail.gmail.com>
 <CACcCbPA_kgah8--mRtJBVACGoECm_=ze0ObfaWvOs28Lvy4rCw@mail.gmail.com>
 <CA+VSrLrF1Ta1zwFsHET-CGFFitHyUoEADrtrBu8A+bEOMUSTBg@mail.gmail.com> <CACcCbPA4kGJJmEeVfxi=jX_YNXKvHN4NxZ8yN-iucmqLsBHagg@mail.gmail.com>
From: Alain RODRIGUEZ <arodrime@gmail.com>
Date: Wed, 3 May 2017 15:00:01 +0100
Message-ID: <CA+VSrLrQy8RqQKoV-=jCh=4Q1rSVvvnVT5=Qpg6WwWMuOaZr9A@mail.gmail.com>
Subject: Re: Drop tables takes too long
To: Bohdan Tantsiura <bohdantan@gmail.com>
Cc: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=f403045e427ec2f6a1054e9f131d
archived-at: Wed, 03 May 2017 14:00:32 -0000

--f403045e427ec2f6a1054e9f131d
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi,

A few comments:

Long GC Pauses take about one minute
>

This is huge. About JVM config, I haven't played much with G1GC, but the
following seems to be a bad idea according to comments:

## Main G1GC tunable: lowering the pause target will lower throughput and
vise versa.
## 200ms is the JVM default and lowest viable setting
## 1000ms increases throughput. Keep it smaller than the timeouts in
cassandra.yaml.
-XX:MaxGCPauseMillis=3D15000

# Save CPU time on large (>=3D 16GB) heaps by delaying region scanning
# until the heap is 70% full. The default in Hotspot 8u40 is 40%.
-XX:InitiatingHeapOccupancyPercent=3D30
# For systems with > 8 cores, the default ParallelGCThreads is 5/8 the
number of logical cores.
# Otherwise equal to the number of cores when 8 or less.
# Machines with > 10 cores should try setting these to <=3D full cores.
-XX:ParallelGCThreads=3D8
# By default, ConcGCThreads is 1/4 of ParallelGCThreads.
# Setting both to the same value can reduce STW durations.
-XX:ConcGCThreads=3D8


Why did you move from defaults that much? Would you consider
giving defaults a try on a canary node and monitor / compare GC times to
other nodes?

1) ColumnFamilyStore.java:542 - Failed unregistering mbean:
> org.apache.cassandra.db:type=3DTables,keyspace=3D...,table=3D...
>  from MigrationStage thread
>

I am not sure about this one... :p

2) Read 1715 live rows and 1505 tombstone cells for query ...
>  from ReadStage thread


Half of what was read for this query was deleted data. With obvious disk
space, disk throughput and latency consequences. This is an entire topic...
Here is what I know about it: thelastpickle.com/blog/2016/
07/27/about-deletes-and-tombstones.html, I hope it will help you solving
your issue.

3) GCInspector.java:282 - G1 Young Generation GC in 1725ms


This might be related to your GC configuration or some other issues
mentioned in your last mail.

 About 3000-6000 CompactionExecutor pending tasks appeared on all nodes
> from time to time.


Hum that's weird. It's huge, but as you have so many tables I am not sure,
it might be a 'normal' issue when running with so many tables and MVs.

What do you mean from time to time? For how long are this task pending,
what frequency is this happening?

Is the number of sstables constant and balanced between nodes?
Also do you run full or incremental repairs?
Do you use LCS or do some manual compactions (or trigger any
anti-compactions)?


> About 1000 MigrationStage pending tasks appeared on 2 nodes.


That's pending writes. Meaning this Cassandra node can't cope with what
what is thrown at it. It can be related to pending flushes (blocking
writes), huge Garbage Collection (Stop The World, including writes), due to
hardware limits (CPU busy with compactions?) or even to a too conservative
configuration of the concurrent_write.


> About 700 InternalResponseState pending tasks appeared on 2 nodes.


I never had issues with this one and so didn't knew much about it. But
according to Chris Lohfink in this post
https://www.pythian.com/blog/guide-to-cassandra-thread-pools/#InternalRespo=
nseStage,
this
thread pool is responsible for "Responding to non-client initiated
messages, including bootstrapping and schema checking". Which again might
be related with the huge number of tables in the cluster. How is CPU doing,
is there any burst in CPU that could be related to these errors?

About 60 MemtableFlushWriter appeared on 3 nodes.


What number of MemtableFlushWriter are you using. Consider increasing it
(or maybe the memtable size).


> There were no blocked tasks, but there were "All time blocked" tasks (the=
y
> were before starting dropping tables) from 3 millions to 20 millions on
> different nodes.


What tasks were dropped.

The cluster doesn't look completely healthy, but I believe it is possible
to improve things, before thinking about splitting tables in multiples
cluster. I would definitely not add more tables though...

C*heers,
-----------------------
Alain Rodriguez - @arodream - alain@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2017-04-28 14:35 GMT+01:00 Bohdan Tantsiura <bohdantan@gmail.com>:

> Thanks Alain,
>
> > Or is it on happening during drop table actions?
> Some other schema changes (e.g. adding columns to tables) also takes too
> much time.
>
> Link to complete set of GC options: https://pastebin.com/4qyENeyu
>
> > Have you had a look at logs, mainly errors and warnings?
> In logs I found warnings of 3 types:
> 1) ColumnFamilyStore.java:542 - Failed unregistering mbean:
> org.apache.cassandra.db:type=3DTables,keyspace=3D...,table=3D...
>  from MigrationStage thread
> 2) Read 1715 live rows and 1505 tombstone cells for query ...
>  from ReadStage thread
> 3) GCInspector.java:282 - G1 Young Generation GC in 1725ms.  G1 Eden
> Space: 38017171456 -> 0; G1 Survivor Space: 2516582400 <(251)%20658-2400>
> -> 2650800128; from Service Thread
>
> > Are they some pending, blocked or dropped tasks in thread pool stats?
> About 3000-6000 CompactionExecutor pending tasks appeared on all nodes
> from time to time. About 1000 MigrationStage pending tasks appeared on 2
> nodes. About 700 InternalResponseState pending tasks appeared on 2 nodes.
> About 60 MemtableFlushWriter appeared on 3 nodes.
> There were no blocked tasks, but there were "All time blocked" tasks (the=
y
> were before starting dropping tables) from 3 millions to 20 millions on
> different nodes.
>
> > Are some resources constraint (CPU / disk IO,...)?
> CPU and disk IO are not constraint
>
> Thanks
>
> 2017-04-27 11:10 GMT+03:00 Alain RODRIGUEZ <arodrime@gmail.com>:
>
>> Hi
>>
>>
>>> Long GC Pauses take about one minute. But why it takes so much time and
>>> how that can be fixed?
>>
>>
>> This is very long. Looks like you are having a major issue, and it is no=
t
>> just about dropping tables... Or is it on happening during drop table
>> actions? Knowing the complete set of GC options in use could help here,
>> could you paste it here (or link to it)?
>>
>> Also, GC is often high as a consequence of other issues and not only whe=
n
>> 'badly=E2=80=98 tuned
>>
>>
>>    - Have you had a look at logs, mainly errors and warnings?
>>
>>    $ grep -e "ERROR" -e "WARN" /var/log/cassandra/system.log
>>
>>    - Are they some pending, blocked or dropped tasks in thread pool
>>    stats?
>>
>>    $ watch -d nodetool tpstats
>>
>>    - Are some resources constraint (CPU / disk IO,...)?
>>
>>
>> We have about 60 keyspaces with about 80 tables in each keyspace
>>
>> In each keyspace we also have 11 MVs
>>
>>
>> Even if I believe we can dig it and maybe improve things, I agree with
>> Carlos, this is a lot of Tables (4880) and even more a high number of MV
>> (660). It might be interesting splitting it somehow if possible.
>>
>> Cannot achieve consistency level ALL
>>
>>
>> Finally you could try to adjust the corresponding request timeout (not
>> sure if it is the global one or the truncate timeout), so it may succeed
>> even when nodes are having minutes GC, but it is a workaround as this
>> minute GC will most definitely be an issue for the client queries runnin=
g
>> (default is 10 sec timeout, so many query are probably failing).
>>
>> C*heers,
>> -----------------------
>> Alain Rodriguez - @arodream - alain@thelastpickle.com
>> France
>>
>> The Last Pickle - Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> 2017-04-25 13:58 GMT+02:00 Bohdan Tantsiura <bohdantan@gmail.com>:
>>
>>> Thanks Zhao Yang,
>>>
>>> > Could you try some jvm tool to find out which thread are allocating
>>> memory or gc? maybe the migration stage thread..
>>>
>>> I use Cassandra Cluster Manager to locally reproduce the issue. I tried
>>> to use VisualVM to find out which threads are allocating memory, but Vi=
sualVM
>>> does not see cassandra processes and says "Cannot open application with
>>> pid". Then I tried to use YourKit Java Profiler. It created snapshot wh=
en
>>> process of one cassandra node failed. http://i.imgur.com/9jBcjcl.png -
>>> how CPU is used by threads. http://i.imgur.com/ox5Sozy.png - how memory
>>> is used by threads, but biggest part of memory is used by objects witho=
ut
>>> allocation information. http://i.imgur.com/oqx9crX.png - which objects
>>> use biggest part of memory. Maybe you know some other good jvm tool tha=
t
>>> can show by which threads biggest part of memory is used?
>>>
>>> > BTW, is your cluster under high load while dropping table?
>>>
>>> LA5 was <=3D 5 on all nodes almost all time while dropping tables
>>>
>>> Thanks
>>>
>>> 2017-04-21 19:49 GMT+03:00 Jasonstack Zhao Yang <
>>> zhaoyangsingapore@gmail.com>:
>>>
>>>> Hi Bohdan, Carlos,
>>>>
>>>> Could you try some jvm tool to find out which thread are allocating
>>>> memory or gc? maybe the migration stage thread..
>>>>
>>>> BTW, is your cluster under high load while dropping table?
>>>>
>>>> As far as I remember, in older c* version, it applies the schema
>>>> mutation in memory, ie. DROP, then flush all schema info into sstable,=
 then
>>>> reads all on disk schema into memory (5k tables info + related column
>>>> info)..
>>>>
>>>> > You also might need to increase the node count if you're resource
>>>> constrained.
>>>>
>>>> More nodes won't help and most probably make it worse due to
>>>> coordination.
>>>>
>>>>
>>>> Zhao Yang
>>>>
>>>>
>>>>
>>>> On Fri, 21 Apr 2017 at 21:10 Bohdan Tantsiura <bohdantan@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Problem is still not solved. Does anybody have any idea what to do
>>>>> with it?
>>>>>
>>>>> Thanks
>>>>>
>>>>> 2017-04-20 15:05 GMT+03:00 Bohdan Tantsiura <bohdantan@gmail.com>:
>>>>>
>>>>>> Thanks Carlos,
>>>>>>
>>>>>> In each keyspace we also have 11 MVs.
>>>>>>
>>>>>> It is impossible to reduce number of tables now. Long GC Pauses take
>>>>>> about one minute. But why it takes so much time and how that can be =
fixed?
>>>>>>
>>>>>> Each node in cluster has 128GB RAM, so resources are not constrained
>>>>>> now
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> 2017-04-20 13:18 GMT+03:00 Carlos Rolo <rolo@pythian.com>:
>>>>>>
>>>>>>> You have 4800 Tables in total? That is a lot of tables, plus MVs? o=
r
>>>>>>> MVs are already considered in the 60*80 account?
>>>>>>>
>>>>>>> I would recommend to reduce the table number. Other thing is that
>>>>>>> you need to check your log file for GC Pauses, and how long those p=
auses
>>>>>>> take.
>>>>>>>
>>>>>>> You also might need to increase the node count if you're resource
>>>>>>> constrained.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Carlos Juzarte Rolo
>>>>>>> Cassandra Consultant / Datastax Certified Architect / Cassandra MVP
>>>>>>>
>>>>>>> Pythian - Love your data
>>>>>>>
>>>>>>> rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
>>>>>>> *linkedin.com/in/carlosjuzarterolo
>>>>>>> <http://linkedin.com/in/carlosjuzarterolo>*
>>>>>>> Mobile: +351 918 918 100 <+351%20918%20918%20100>
>>>>>>> www.pythian.com
>>>>>>>
>>>>>>> On Thu, Apr 20, 2017 at 11:10 AM, Bohdan Tantsiura <
>>>>>>> bohdantan@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> We are using cassandra 3.10 in a 10 nodes cluster with replication
>>>>>>>> =3D 3. MAX_HEAP_SIZE=3D64GB on all nodes, G1 GC is used. We have a=
bout 60
>>>>>>>> keyspaces with about 80 tables in each keyspace. We had to delete =
three
>>>>>>>> tables and two materialized views from each keyspace. It began to =
take more
>>>>>>>> and more time for each next keyspace (for some keyspaces it took a=
bout 30
>>>>>>>> minutes) and then failed with "Cannot achieve consistency level AL=
L". After
>>>>>>>> restarting the same repeated. It seems that cassandra hangs on GC.=
 How that
>>>>>>>> can be solved?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>

--f403045e427ec2f6a1054e9f131d
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi,<br><br>A few comments:<div><br><blockquote class=3D"gm=
ail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,=
204,204);padding-left:1ex"><span style=3D"font-size:12.8px">Long GC Pauses =
take about one minute</span><br></blockquote><br></div><div>This is huge. A=
bout JVM config, I haven&#39;t played much with G1GC, but the following see=
ms to be a bad idea according to comments:<br><br><div>## Main G1GC tunable=
: lowering the pause target will lower throughput and vise versa.</div><div=
>## 200ms is the JVM default and lowest viable setting</div><div>## 1000ms =
increases throughput. Keep it smaller than the timeouts in cassandra.yaml.<=
/div><div>-XX:MaxGCPauseMillis=3D15000<br><br><div># Save CPU time on large=
 (&gt;=3D 16GB) heaps by delaying region scanning</div><div># until the hea=
p is 70% full. The default in Hotspot 8u40 is 40%.</div><div>-XX:Initiating=
HeapOccupancyPercent=3D30</div><div># For systems with &gt; 8 cores, the de=
fault ParallelGCThreads is 5/8 the number of logical cores.</div><div># Oth=
erwise equal to the number of cores when 8 or less.</div><div># Machines wi=
th &gt; 10 cores should try setting these to &lt;=3D full cores.</div><div>=
-XX:ParallelGCThreads=3D8</div><div># By default, ConcGCThreads is 1/4 of P=
arallelGCThreads.</div><div># Setting both to the same value can reduce STW=
 durations.</div><div>-XX:ConcGCThreads=3D8<br><br></div><br>Why did you mo=
ve from defaults that much? Would you consider giving=C2=A0defaults a try o=
n a canary node and monitor / compare GC times to other nodes?</div></div><=
div><br><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex=
;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span style=3D"fo=
nt-size:12.8px">1)=C2=A0ColumnFamilyStore.java:542 - Failed unregistering m=
bean: org.apache.cassandra.db:type=3D</span><span style=3D"font-size:12.8px=
">T<wbr>ables,keyspace=3D...,table=3D... =C2=A0from=C2=A0MigrationStage thr=
ead</span><br></blockquote><br>I am not sure about this one... :p<br><br></=
div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bor=
der-left:1px solid rgb(204,204,204);padding-left:1ex"><span style=3D"font-s=
ize:12.8px">2)=C2=A0Read 1715 live rows and 1505 tombstone cells for query =
... =C2=A0from=C2=A0ReadStage thread</span></blockquote><div><br></div><div=
>Half of what was read for this query was deleted data. With obvious disk s=
pace, disk throughput and latency consequences. This is an entire topic... =
Here is what I know about it: <a href=3D"http://thelastpickle.com/blog/2016=
/07/27/about-deletes-and-tombstones.html" target=3D"_blank">thelastpickle.c=
om/blog/2016/<wbr>07/27/about-deletes-and-<wbr>tombstones.html</a>, I hope =
it will help you solving your issue.<br><br><blockquote class=3D"gmail_quot=
e" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204)=
;padding-left:1ex"><span style=3D"font-size:12.8px">3)=C2=A0GCInspector.jav=
a:282 - G1 Young Generation GC in 1725ms</span></blockquote><div><br>This m=
ight be related to your GC configuration or some other issues mentioned in =
your last mail.<br><br><blockquote class=3D"gmail_quote" style=3D"margin:0p=
x 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">=
=C2=A0<span style=3D"font-size:12.8px">About 3000-6000 CompactionExecutor p=
ending tasks appeared on all nodes from time to time.</span></blockquote><d=
iv><br></div><div>Hum that&#39;s weird. It&#39;s huge, but as you have so m=
any tables I am not sure, it might be a &#39;normal&#39; issue when running=
 with so many tables and MVs.<br><br>What do you mean from time to time? Fo=
r how long are this task pending, what frequency is this happening?</div><d=
iv><br>Is the number of sstables constant and balanced between nodes?</div>=
<div>Also do you run full or incremental repairs?<br>Do you use LCS or do s=
ome manual compactions (or trigger any anti-compactions)?</div><div>=C2=A0<=
/div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bo=
rder-left:1px solid rgb(204,204,204);padding-left:1ex"><span style=3D"font-=
size:12.8px">About 1000 MigrationStage pending tasks appeared on 2 nodes.</=
span></blockquote><div><br>That&#39;s pending writes. Meaning this Cassandr=
a node can&#39;t cope with what what is thrown at it. It can be related to =
pending flushes (blocking writes), huge Garbage Collection (Stop The World,=
 including writes), due to hardware limits (CPU busy with compactions?) or =
even to a too conservative configuration of the concurrent_write.<br>=C2=A0=
</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;b=
order-left:1px solid rgb(204,204,204);padding-left:1ex"><span style=3D"font=
-size:12.8px">About 700 InternalResponseState pending tasks appeared on 2 n=
odes.</span></blockquote><div><br>I never had issues with this one and so d=
idn&#39;t knew much about it. But according to Chris Lohfink in this post <=
a href=3D"https://www.pythian.com/blog/guide-to-cassandra-thread-pools/#Int=
ernalResponseStage">https://www.pythian.com/blog/guide-to-cassandra-thread-=
pools/#InternalResponseStage</a>,=C2=A0this thread pool is responsible for =
&quot;Responding to non-client initiated messages, including bootstrapping =
and schema checking&quot;. Which again might be related with the huge numbe=
r of tables in the cluster. How is CPU doing, is there any burst in CPU tha=
t could be related to these errors?<br><br></div><blockquote class=3D"gmail=
_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204=
,204);padding-left:1ex"><span style=3D"font-size:12.8px"> About 60=C2=A0Mem=
tableFlushWriter appeared on 3 nodes.</span></blockquote><div><br></div><di=
v>What number of=C2=A0<span style=3D"font-size:12.8px">MemtableFlushWriter =
are you using. Consider increasing it (or maybe the memtable size).</span><=
/div><div>=C2=A0</div><blockquote style=3D"margin:0px 0px 0px 0.8ex;border-=
left:1px solid rgb(204,204,204);padding-left:1ex" class=3D"gmail_quote"><sp=
an style=3D"font-size:12.8px">There were no blocked tasks, but there were &=
quot;All time blocked&quot; tasks (they were before starting dropping table=
s) from 3 millions to 20 millions on different nodes.</span></blockquote><d=
iv><br></div><div>What tasks were dropped.</div><div><br></div><div>The clu=
ster doesn&#39;t look completely healthy, but I believe it is possible to i=
mprove things, before thinking about splitting tables in multiples cluster.=
 I would definitely not add more tables though...</div></div></div><div><br=
></div><div>C*heers,</div><div><div>-----------------------</div><div>Alain=
 Rodriguez - @arodream - <a href=3D"mailto:alain@thelastpickle.com">alain@t=
helastpickle.com</a></div><div>France</div><div><br></div><div>The Last Pic=
kle - Apache Cassandra Consulting</div><div><a href=3D"http://www.thelastpi=
ckle.com">http://www.thelastpickle.com</a></div></div></div><div class=3D"g=
mail_extra"><br><div class=3D"gmail_quote">2017-04-28 14:35 GMT+01:00 Bohda=
n Tantsiura <span dir=3D"ltr">&lt;<a href=3D"mailto:bohdantan@gmail.com" ta=
rget=3D"_blank">bohdantan@gmail.com</a>&gt;</span>:<br><blockquote class=3D=
"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding=
-left:1ex"><div dir=3D"ltr">Thanks Alain,<div><span class=3D""><br><div>&gt=
;=C2=A0<span style=3D"font-size:12.8px">Or is it on happening during drop t=
able actions?</span></div></span><div><span style=3D"font-size:12.8px">Some=
 other schema changes (e.g. adding columns to tables) also takes too much t=
ime.</span></div><div><span style=3D"font-size:12.8px"><br></span></div><di=
v><span style=3D"font-size:12.8px">Link to complete set of GC options:=C2=
=A0<a href=3D"https://pastebin.com/4qyENeyu" target=3D"_blank">https://past=
ebin.com/<wbr>4qyENeyu</a></span></div><span class=3D""><div><span style=3D=
"font-size:12.8px"><br></span></div><div><span style=3D"font-size:12.8px">&=
gt;=C2=A0</span><span style=3D"font-size:12.8px">Have you had a look at log=
s, mainly errors and warnings?</span></div></span><div><span style=3D"font-=
size:12.8px">In logs I found warnings of 3 types:</span></div><div><span st=
yle=3D"font-size:12.8px">1)=C2=A0ColumnFamilyStore.java:542 - Failed unregi=
stering mbean: org.apache.cassandra.db:type=3D<wbr>Tables,keyspace=3D...,ta=
ble=3D... =C2=A0from=C2=A0MigrationStage thread</span></div><div><span styl=
e=3D"font-size:12.8px">2)=C2=A0Read 1715 live rows and 1505 tombstone cells=
 for query ... =C2=A0from=C2=A0ReadStage thread</span></div><div><span styl=
e=3D"font-size:12.8px">3)=C2=A0GCInspector.java:282 - G1 Young Generation G=
C in 1725ms.=C2=A0 G1 Eden Space: 38017171456 -&gt; 0; G1 Survivor Space: <=
a href=3D"tel:(251)%20658-2400" value=3D"+12516582400" target=3D"_blank">25=
16582400</a> -&gt; 2650800128; from=C2=A0Service Thread</span></div><span c=
lass=3D""><div><span style=3D"font-size:12.8px"><br></span></div><div><span=
 style=3D"font-size:12.8px">&gt;=C2=A0</span><span style=3D"font-size:12.8p=
x">Are they some pending, blocked or dropped tasks in thread pool stats?</s=
pan></div></span><div><span style=3D"font-size:12.8px">About 3000-6000 Comp=
actionExecutor pending tasks appeared on all nodes from time to time. About=
 1000 MigrationStage pending tasks appeared on 2 nodes. About 700 InternalR=
esponseState pending tasks appeared on 2 nodes. About 60=C2=A0MemtableFlush=
Writer appeared on 3 nodes.</span></div><div><span style=3D"font-size:12.8p=
x">There were no blocked tasks, but there were &quot;All time blocked&quot;=
 tasks (they were before starting dropping tables) from 3 millions to 20 mi=
llions on different nodes.</span></div><span class=3D""><div><span style=3D=
"font-size:12.8px"><br></span></div><div><span style=3D"font-size:12.8px">&=
gt;=C2=A0</span><span style=3D"font-size:12.8px">Are some resources constra=
int (CPU / disk IO,...)?</span></div></span><div><span style=3D"font-size:1=
2.8px">CPU and=C2=A0</span><span style=3D"font-size:12.8px">disk IO are not=
 constraint</span></div></div><div><span style=3D"font-size:12.8px"><br></s=
pan></div><div><span style=3D"font-size:12.8px">Thanks</span></div></div><d=
iv class=3D"HOEnZb"><div class=3D"h5"><div class=3D"gmail_extra"><br><div c=
lass=3D"gmail_quote">2017-04-27 11:10 GMT+03:00 Alain RODRIGUEZ <span dir=
=3D"ltr">&lt;<a href=3D"mailto:arodrime@gmail.com" target=3D"_blank">arodri=
me@gmail.com</a>&gt;</span>:<br><blockquote class=3D"gmail_quote" style=3D"=
margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"=
ltr"><div>Hi</div><span><div>=C2=A0</div><blockquote class=3D"gmail_quote" =
style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);pa=
dding-left:1ex"><span style=3D"font-size:12.8px">Long GC Pauses take about =
one minute. But why it takes so much time and how that can be fixed?</span>=
</blockquote><div><br></div></span><div>This is very long. Looks like you a=
re having a major issue, and it is not just about dropping tables... Or is =
it on happening during drop table actions? Knowing the complete set of GC o=
ptions in use could help here, could you paste it here (or link to it)?=C2=
=A0</div><div><br></div><div>Also, GC is often high as a consequence of oth=
er issues and not only when &#39;badly=E2=80=98 tuned</div><div><br></div><=
div><ul><li>Have you had a look at logs, mainly errors and warnings?<br><br=
>$ grep -e &quot;ERROR&quot; -e &quot;WARN&quot; /var/log/cassandra/system.=
log<br><br></li><li>Are they some pending, blocked or dropped tasks in thre=
ad pool stats?=C2=A0<br><br>$ watch -d nodetool tpstats<br><br></li><li>Are=
 some resources constraint (CPU / disk IO,...)?<br></li></ul><span><div><br=
></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;=
border-left:1px solid rgb(204,204,204);padding-left:1ex"><span style=3D"fon=
t-size:12.8px">We have about 60 keyspaces with about 80 tables in each keys=
pace</span>=C2=A0</blockquote></span><blockquote class=3D"gmail_quote" styl=
e=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);paddin=
g-left:1ex"><span style=3D"font-size:12.8px">In each keyspace we also have =
11 MVs</span></blockquote><div><br>Even if I believe we can dig it and mayb=
e improve things, I agree with Carlos, this is a lot of Tables (4880) and e=
ven more a high number of MV (660). It might be interesting splitting it so=
mehow if possible.</div></div><span><div><br></div><blockquote class=3D"gma=
il_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,2=
04,204);padding-left:1ex"><span style=3D"font-size:12.8px">Cannot achieve c=
onsistency level ALL</span></blockquote><div><br></div></span><div>Finally =
you could try to adjust the corresponding request timeout (not sure if it i=
s the global one or the truncate timeout), so it may succeed even when node=
s are having minutes GC, but it is a workaround as this minute GC will most=
 definitely be an issue for the client queries running (default is 10 sec t=
imeout, so many query are probably failing).</div><div><br></div><div>C*hee=
rs,</div><div><div>-----------------------</div><div>Alain Rodriguez - @aro=
dream - <a href=3D"mailto:alain@thelastpickle.com" target=3D"_blank">alain@=
thelastpickle.com</a></div><div>France</div><div><br></div><div>The Last Pi=
ckle - Apache Cassandra Consulting</div><div><a href=3D"http://www.thelastp=
ickle.com" target=3D"_blank">http://www.thelastpickle.com</a></div></div></=
div><div class=3D"m_-830194251682398690HOEnZb"><div class=3D"m_-83019425168=
2398690h5"><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">2017-0=
4-25 13:58 GMT+02:00 Bohdan Tantsiura <span dir=3D"ltr">&lt;<a href=3D"mail=
to:bohdantan@gmail.com" target=3D"_blank">bohdantan@gmail.com</a>&gt;</span=
>:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-l=
eft:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">Thanks Zhao Yang,<spa=
n><br><br>&gt;=C2=A0<span style=3D"font-size:12.8px">Could you try some jvm=
 tool to find out which thread are allocating memory or gc? maybe the migra=
tion stage thread..</span></span><div><span style=3D"font-size:12.8px"><br>=
</span>I use Cassandra Cluster Manager to locally reproduce the issue. I tr=
ied to use VisualVM=C2=A0<span style=3D"font-size:12.8px">to find out which=
 threads are allocating memory, but=C2=A0</span>VisualVM does not see cassa=
ndra processes and says &quot;Cannot open application with pid&quot;. Then =
I tried to use YourKit Java Profiler. It created snapshot when process of o=
ne cassandra node failed.=C2=A0<a href=3D"http://i.imgur.com/9jBcjcl.png" t=
arget=3D"_blank">http://i.imgur.com/9jB<wbr>cjcl.png</a> - how CPU is used =
by threads.=C2=A0<a href=3D"http://i.imgur.com/ox5Sozy.png" target=3D"_blan=
k">http://i.imgur.com/ox<wbr>5Sozy.png</a> - how memory is used by threads,=
 but biggest part of memory is used by objects without allocation informati=
on.=C2=A0<a href=3D"http://i.imgur.com/oqx9crX.png" target=3D"_blank">http:=
//i.imgur.co<wbr>m/oqx9crX.png</a> - which objects use biggest part of memo=
ry. Maybe you know some other good jvm tool that can show by which threads =
biggest part of memory is used?</div><span><div><br></div><div>&gt;=C2=A0<s=
pan style=3D"font-size:12.8px">BTW, is your cluster under high load while d=
ropping table?</span></div><div><span style=3D"font-size:12.8px"><br></span=
></div></span><div><span style=3D"font-size:12.8px">LA5 was &lt;=3D 5 on al=
l nodes almost all time while dropping tables</span></div><div><span style=
=3D"font-size:12.8px"><br></span></div><div><span style=3D"font-size:12.8px=
">Thanks</span></div></div><div class=3D"m_-830194251682398690m_46101597981=
55184426HOEnZb"><div class=3D"m_-830194251682398690m_4610159798155184426h5"=
><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">2017-04-21 19:49=
 GMT+03:00 Jasonstack Zhao Yang <span dir=3D"ltr">&lt;<a href=3D"mailto:zha=
oyangsingapore@gmail.com" target=3D"_blank">zhaoyangsingapore@gmail.com</a>=
&gt;</span>:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8e=
x;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">Hi Bohdan, =
Carlos,<div><br></div><div>Could you try some jvm tool to find out which th=
read are allocating memory or gc? maybe the migration stage thread..</div><=
div><br></div><div>BTW, is your cluster under high load while dropping tabl=
e?</div><div><br></div><div>As far as I remember, in older c* version, it a=
pplies the schema mutation in memory, ie. DROP, then flush all schema info =
into sstable, then reads all on disk schema into memory (5k tables info + r=
elated column info)..</div><span><div><br></div><div>&gt;=C2=A0<span style=
=3D"color:rgb(33,33,33)">You also might need to increase the node count if =
you&#39;re resource constrained.</span></div><div><span style=3D"color:rgb(=
33,33,33)"><br></span></div></span><div><span style=3D"color:rgb(33,33,33)"=
>More nodes won&#39;t help and most probably make it worse due to coordinat=
ion.</span></div><span class=3D"m_-830194251682398690m_4610159798155184426m=
_-4270474812593501713HOEnZb"><font color=3D"#888888"><div><br></div><div><b=
r></div><div>Zhao Yang</div><div><br></div><div><br></div></font></span></d=
iv><div class=3D"m_-830194251682398690m_4610159798155184426m_-4270474812593=
501713HOEnZb"><div class=3D"m_-830194251682398690m_4610159798155184426m_-42=
70474812593501713h5"><br><div class=3D"gmail_quote"><div dir=3D"ltr">On Fri=
, 21 Apr 2017 at 21:10 Bohdan Tantsiura &lt;<a href=3D"mailto:bohdantan@gma=
il.com" target=3D"_blank">bohdantan@gmail.com</a>&gt; wrote:<br></div><bloc=
kquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #cc=
c solid;padding-left:1ex"><div dir=3D"ltr">Hi,<div><br></div><div>Problem i=
s still not solved. Does anybody have any idea what to do with it?</div><di=
v><br></div><div>Thanks</div></div><div class=3D"gmail_extra"><br><div clas=
s=3D"gmail_quote">2017-04-20 15:05 GMT+03:00 Bohdan Tantsiura <span dir=3D"=
ltr">&lt;<a href=3D"mailto:bohdantan@gmail.com" target=3D"_blank">bohdantan=
@gmail.com</a>&gt;</span>:<br><blockquote class=3D"gmail_quote" style=3D"ma=
rgin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"lt=
r"><div>Thanks Carlos,</div><div><br></div>In each keyspace we also have 11=
 MVs.<div><br></div><div>It is impossible to reduce number of tables now. L=
ong GC Pauses take about one minute. But why it takes so much time and how =
that can be fixed?</div><div><br></div><div>Each node in cluster has 128GB =
RAM, so=C2=A0<span style=3D"font-size:12.8px">resources are not constrained=
 now</span></div><div><br></div><div>Thanks</div></div><div class=3D"m_-830=
194251682398690m_4610159798155184426m_-4270474812593501713m_-36573648046290=
00280m_-4815479175655682300HOEnZb"><div class=3D"m_-830194251682398690m_461=
0159798155184426m_-4270474812593501713m_-3657364804629000280m_-481547917565=
5682300h5"><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">2017-0=
4-20 13:18 GMT+03:00 Carlos Rolo <span dir=3D"ltr">&lt;<a href=3D"mailto:ro=
lo@pythian.com" target=3D"_blank">rolo@pythian.com</a>&gt;</span>:<br><bloc=
kquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #cc=
c solid;padding-left:1ex"><div dir=3D"ltr">You have 4800 Tables in total? T=
hat is a lot of tables, plus MVs? or MVs are already considered in the 60*8=
0 account?<div><br></div><div>I would recommend to reduce the table number.=
 Other thing is that you need to check your log file for GC Pauses, and how=
 long those pauses take.=C2=A0</div><div><br></div><div>You also might need=
 to increase the node count if you&#39;re resource constrained.</div></div>=
<div class=3D"gmail_extra"><br clear=3D"all"><div><div class=3D"m_-83019425=
1682398690m_4610159798155184426m_-4270474812593501713m_-3657364804629000280=
m_-4815479175655682300m_-4528001472318957043m_-7425516900999196820gmail_sig=
nature" data-smartmail=3D"gmail_signature"><div dir=3D"ltr"><div><div dir=
=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr">=
<div>Regards,<br></div><div><br></div><div>Carlos Juzarte Rolo</div><div>Ca=
ssandra Consultant / Datastax Certified Architect / Cassandra MVP<br></div>=
<div>=C2=A0</div><div>Pythian - Love your data</div><div><br></div><div>rol=
o@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin: <font color=3D"#11=
55cc"><u><a href=3D"http://linkedin.com/in/carlosjuzarterolo" target=3D"_bl=
ank">linkedin.com/in/carlosjuzarter<wbr>olo <br></a></u></font></div><div>M=
obile: <a href=3D"tel:+351%20918%20918%20100" value=3D"+351918918100" targe=
t=3D"_blank">+351 918 918 100</a> <br></div><div><a href=3D"http://www.pyth=
ian.com/" style=3D"color:rgb(17,85,204)" target=3D"_blank">www.pythian.com<=
/a></div></div></div></div></div></div></div></div></div></div></div></div>=
<div><div class=3D"m_-830194251682398690m_4610159798155184426m_-42704748125=
93501713m_-3657364804629000280m_-4815479175655682300m_-4528001472318957043h=
5">
<br><div class=3D"gmail_quote">On Thu, Apr 20, 2017 at 11:10 AM, Bohdan Tan=
tsiura <span dir=3D"ltr">&lt;<a href=3D"mailto:bohdantan@gmail.com" target=
=3D"_blank">bohdantan@gmail.com</a>&gt;</span> wrote:<br><blockquote class=
=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padd=
ing-left:1ex"><div dir=3D"ltr"><div style=3D"font-size:12.8px">Hi,</div><di=
v style=3D"font-size:12.8px"><br></div><div style=3D"font-size:12.8px">We a=
re using cassandra 3.10 in a 10 nodes cluster with replication =3D 3. MAX_H=
EAP_SIZE=3D64GB on all nodes, G1 GC is used. We have about 60 keyspaces wit=
h about 80 tables in each keyspace. We had to delete three tables and two m=
aterialized views from each keyspace. It began to take more and more time f=
or each next keyspace (for some keyspaces it took about 30 minutes) and the=
n failed with &quot;Cannot achieve consistency level ALL&quot;. After resta=
rting the same repeated. It seems that cassandra hangs on GC. How that can =
be solved?</div><div style=3D"font-size:12.8px"><br></div><div style=3D"fon=
t-size:12.8px">Thanks</div></div>
</blockquote></div><br></div></div></div>

<br>
<p>--</p><p><br><br></p><p></p></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</blockquote></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--f403045e427ec2f6a1054e9f131d--