Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: local policy)
Content-Type: text/plain; charset=windows-1252
Mime-Version: 1.0 (Mac OS X Mail 6.0 \(1485\))
Subject: Re: Why the StageManager thread pools have 60 seconds keepalive time?
From: aaron morton <aaron@thelastpickle.com>
In-Reply-To: 
 <CAKKFZSronXarja4d=vv3J6-07uoFD_hJnky1tDH3SOFW4mMFJw@mail.gmail.com>
Date: Wed, 22 Aug 2012 16:49:03 +1200
Content-Transfer-Encoding: quoted-printable
Message-Id: <9D5786F0-8646-40E4-828D-4ED69D68EFB1@thelastpickle.com>
References: 
 <CAKKFZSoXXQiztRbOtjXACBfuOz75gVtfWkNWx0RtF3N4M1Cgkg@mail.gmail.com>
 <E1AF4B5E-9EEF-497A-B870-DCC6704911F8@thelastpickle.com>
 <CAKKFZSocu9srnjHTZUmr0v_CMs=5Joc6Vna9qE_z2HyDs7i46g@mail.gmail.com>
 <E3E32F15-1E0C-494C-9F32-E507748B4FEC@thelastpickle.com>
 <CAKKFZSronXarja4d=vv3J6-07uoFD_hJnky1tDH3SOFW4mMFJw@mail.gmail.com>
To: user@cassandra.apache.org

> One thing we did change in the past weeks was the =
memtable_flush_queue_size in order to occupy less heap space with =
memtables, this was due to having received this warning message and some =
OOM exceptions:
Danger.=20

> Do you know any strategy to diagnose if memtables flushing to disk and =
locking on the switchLock being the main cause of the dropped messages? =
I've went through the source code but haven't seen any metrics reporting =
on maybeSwitchMemtable blocking times.
As a matter of fact I do :)

Was the first thing in my cassandra Sf talk=20
=
http://www.slideshare.net/aaronmorton/cassandra-sf-2012-technical-deep-div=
e-query-performance/6
http://www.datastax.com/events/cassandrasummit2012/presentations

If you reduce memtable_flush_queue_size to far writes will block. When =
this happens you will see the MeteredFlusher say it want to flush X =
cf's, but you will only see a few messages that say "Enqueuing flush of =
=85"

In a "FlushWriter-*" thread you will see the Memtable log "Writing=85" =
when it starts flushing and "Completed flushing =85" when done. If the =
MeteredFlusher is blocked it will immediately "Enqueuing flush of =85" =
when the Memtable starts writing the next SStable.=20

Hope that helps.=20

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 22/08/2012, at 6:38 AM, Guillermo Winkler <gwinkler@inconcertcc.com> =
wrote:

> Aaron, thanks for your answer.=20
>=20
> We do have big batch updates not always with the columns belonging to =
the same row(i.e. many threads are needed to handle the updates), but it =
did not not represented a problem when the CFs had less data on them.
>=20
> One thing we did change in the past weeks was the =
memtable_flush_queue_size in order to occupy less heap space with =
memtables, this was due to having received this warning message and some =
OOM exceptions:
>=20
>             logger.warn(String.format("Reducing %s capacity from %d to =
%s to reduce memory pressure",
>                                       cacheType, getCapacity(), =
newCapacity));
>=20
>=20
>=20
> Do you know any strategy to diagnose if memtables flushing to disk and =
locking on the switchLock being the main cause of the dropped messages? =
I've went through the source code but haven't seen any metrics reporting =
on maybeSwitchMemtable blocking times.
>=20
> Thanks again,
> Guille
>=20
> On Sun, Aug 19, 2012 at 5:21 AM, aaron morton =
<aaron@thelastpickle.com> wrote:
> Your seeing dropped mutations reported from nodetool tpstats ?=20
>=20
> Take a look at the logs. Look for messages from the MessagingService =
with the pattern "{} {} messages dropped in last {}ms" They will be =
followed by info about the TP stats.
>=20
> First would be the workload. Are you sending very big batch_mutate or =
multiget requests? Each row in the requests turns into a command in the =
appropriate thread pool. This can result in other requests waiting a =
long time for their commands to get processed.=20
>=20
> Next would be looking for GC and checking the =
memtable_flush_queue_size is set high enough (check yaml for docs).=20
>=20
> After that I would look at winding  concurrent_writers (and I assume =
concurrent_readers) back. Anytime I see weirdness I look for config =
changes and see what happens when they are returned to the default or =
near default.  Do you have 16 _physical_ cores?
>=20
> Hope that helps.=20
>  =20
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>=20
> On 18/08/2012, at 10:01 AM, Guillermo Winkler =
<gwinkler@inconcertcc.com> wrote:
>=20
>> Aaron, thanks for your answer.
>>=20
>> I'm actually tracking a problem where mutations get dropped and =
cfstats show no activity whatsoever, I have 100 threads for the mutation =
pool, no running or pending tasks, but some mutations get dropped none =
the less.
>>=20
>> I'm thinking about some scheduling problems but not really sure yet.
>>=20
>> Have you ever seen a case of dropped mutations with the system under =
light load?
>>=20
>> Thanks,
>> Guille
>>=20
>>=20
>> On Thu, Aug 16, 2012 at 8:22 PM, aaron morton =
<aaron@thelastpickle.com> wrote:
>> That's some pretty old code. I would guess it was done that way to =
conserve resources. And _i think_ thread creation is pretty light =
weight.
>>=20
>> Jonathan / Brandon / others - opinions ?=20
>>=20
>> Cheers
>>=20
>>=20
>> -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>=20
>> On 17/08/2012, at 8:09 AM, Guillermo Winkler =
<gwinkler@inconcertcc.com> wrote:
>>=20
>>> Hi, I have a cassandra cluster where I'm seeing a lot of thread =
trashing from the mutation pool.
>>>=20
>>> MutationStage:72031
>>>=20
>>> Where threads get created and disposed in 100's batches every few =
minutes, since it's a 16 core server concurrent_writes is set in 100 in =
the cassandra.yaml.=20
>>>=20
>>> concurrent_writes: 100
>>>=20
>>> I've seen in the StageManager class this pools get created with 60 =
seconds keepalive time.
>>>=20
>>> DebuggableThreadPoolExecutor -> allowCoreThreadTimeOut(true);
>>>=20
>>> StageManager-> public static final long KEEPALIVE =3D 60; // seconds =
to keep "extra" threads alive for when idle
>>>=20
>>> Is it a reason for it to be this way?=20
>>>=20
>>> Why not have a fixed size pool with Integer.MAX_VALUE as keepalive =
since corePoolSize and maxPoolSize are set at the same size?
>>>=20
>>> Thanks,
>>> Guille
>>>=20
>>=20
>>=20
>=20
>=20