Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAOac0GCtGPrSNGCEPs0dXBrFDLQSrTV-CDFArhuJ7gSDtdYquQ@mail.gmail.com>
References: 
 <CAOac0GBhL-gS3_tpGnFUQ=Zc1t5iPJFw2TL9_kn-W01Jr7zAtQ@mail.gmail.com>
	<CAKaZCX6EEogHEwXEgC_T_XyUvqa7-edtwY_uJU6nwfegCfNy_Q@mail.gmail.com>
	<CAOac0GDPE5DzoA66dUTbOcHTxocSrKwtHkZ6fH_zVVpU34hTvQ@mail.gmail.com>
	<8DBDEFD8-8348-4C37-9B1E-921D4F9151D2@crowdstrike.com>
	<CAOac0GCtGPrSNGCEPs0dXBrFDLQSrTV-CDFArhuJ7gSDtdYquQ@mail.gmail.com>
Date: Thu, 11 Feb 2016 09:36:16 +0100
Message-ID: 
 <CAJgLgch8uJO6SWifJEOfDUCbw6=AMPSxtvVBCWec-m8Ub+UiBg@mail.gmail.com>
Subject: Re: Debugging write timeouts on Cassandra 2.2.5
From: Fabrice Facorat <fabrice.facorat@gmail.com>
To: user@cassandra.apache.org
Cc: Peter Norton <pcn@librato.com>
Content-Type: text/plain; charset=UTF-8

Are your commitlog and data on the same disk ? If yes, you should put
commitlogs on a separate disk which don't have a lot of IO.

Others IO may have great impact impact on your commitlog writing and
it may even block.

An example of impact IO may have, even for Async writes:
https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic

2016-02-11 0:31 GMT+01:00 Mike Heffner <mike@librato.com>:
> Jeff,
>
> We have both commitlog and data on a 4TB EBS with 10k IOPS.
>
> Mike
>
> On Wed, Feb 10, 2016 at 5:28 PM, Jeff Jirsa <jeff.jirsa@crowdstrike.com>
> wrote:
>>
>> What disk size are you using?
>>
>>
>>
>> From: Mike Heffner
>> Reply-To: "user@cassandra.apache.org"
>> Date: Wednesday, February 10, 2016 at 2:24 PM
>> To: "user@cassandra.apache.org"
>> Cc: Peter Norton
>> Subject: Re: Debugging write timeouts on Cassandra 2.2.5
>>
>> Paulo,
>>
>> Thanks for the suggestion, we ran some tests against CMS and saw the same
>> timeouts. On that note though, we are going to try doubling the instance
>> sizes and testing with double the heap (even though current usage is low).
>>
>> Mike
>>
>> On Wed, Feb 10, 2016 at 3:40 PM, Paulo Motta <pauloricardomg@gmail.com>
>> wrote:
>>>
>>> Are you using the same GC settings as the staging 2.0 cluster? If not,
>>> could you try using the default GC settings (CMS) and see if that changes
>>> anything? This is just a wild guess, but there were reports before of
>>> G1-caused instabilities with small heap sizes (< 16GB - see CASSANDRA-10403
>>> for more context). Please ignore if you already tried reverting back to CMS.
>>>
>>> 2016-02-10 16:51 GMT-03:00 Mike Heffner <mike@librato.com>:
>>>>
>>>> Hi all,
>>>>
>>>> We've recently embarked on a project to update our Cassandra
>>>> infrastructure running on EC2. We are long time users of 2.0.x and are
>>>> testing out a move to version 2.2.5 running on VPC with EBS. Our test setup
>>>> is a 3 node, RF=3 cluster supporting a small write load (mirror of our
>>>> staging load).
>>>>
>>>> We are writing at QUORUM and while p95's look good compared to our
>>>> staging 2.0.x cluster, we are seeing frequent write operations that time out
>>>> at the max write_request_timeout_in_ms (10 seconds). CPU across the cluster
>>>> is < 10% and EBS write load is < 100 IOPS. Cassandra is running with the
>>>> Oracle JDK 8u60 and we're using G1GC and any GC pauses are less than 500ms.
>>>>
>>>> We run on c4.2xl instances with GP2 EBS attached storage for data and
>>>> commitlog directories. The nodes are using EC2 enhanced networking and have
>>>> the latest Intel network driver module. We are running on HVM instances
>>>> using Ubuntu 14.04.2.
>>>>
>>>> Our schema is 5 tables, all with COMPACT STORAGE. Each table is similar
>>>> to the definition here:
>>>> https://gist.github.com/mheffner/4d80f6b53ccaa24cc20a
>>>>
>>>> This is our cassandra.yaml:
>>>> https://gist.github.com/mheffner/fea80e6e939dd483f94f#file-cassandra-yaml
>>>>
>>>> Like I mentioned we use 8u60 with G1GC and have used many of the GC
>>>> settings in Al Tobey's tuning guide. This is our upstart config with JVM and
>>>> other CPU settings: https://gist.github.com/mheffner/dc44613620b25c4fa46d
>>>>
>>>> We've used several of the sysctl settings from Al's guide as well:
>>>> https://gist.github.com/mheffner/ea40d58f58a517028152
>>>>
>>>> Our client application is able to write using either Thrift batches
>>>> using Asytanax driver or CQL async INSERT's using the Datastax Java driver.
>>>>
>>>> For testing against Thrift (our legacy infra uses this) we write batches
>>>> of anywhere from 6 to 1500 rows at a time. Our p99 for batch execution is
>>>> around 45ms but our maximum (p100) sits less than 150ms except when it
>>>> periodically spikes to the full 10seconds.
>>>>
>>>> Testing the same write path using CQL writes instead demonstrates
>>>> similar behavior. Low p99s except for periodic full timeouts. We enabled
>>>> tracing for several operations but were unable to get a trace that completed
>>>> successfully -- Cassandra started logging many messages as:
>>>>
>>>> INFO  [ScheduledTasks:1] - MessagingService.java:946 - _TRACE messages
>>>> were dropped in last 5000 ms: 52499 for internal timeout and 0 for cross
>>>> node timeout
>>>>
>>>> And all the traces contained rows with a "null" source_elapsed row:
>>>> https://gist.githubusercontent.com/mheffner/1d68a70449bd6688a010/raw/0327d7d3d94c3a93af02b64212e3b7e7d8f2911b/trace.out
>>>>
>>>>
>>>> We've exhausted as many configuration option permutations that we can
>>>> think of. This cluster does not appear to be under any significant load and
>>>> latencies seem to largely fall in two bands: low normal or max timeout. This
>>>> seems to imply that something is getting stuck and timing out at the max
>>>> write timeout.
>>>>
>>>> Any suggestions on what to look for? We had debug enabled for awhile but
>>>> we didn't see any msg that pointed to something obvious. Happy to provide
>>>> any more information that may help.
>>>>
>>>> We are pretty much at the point of sprinkling debug around the code to
>>>> track down what could be blocking.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Mike
>>>>
>>>> --
>>>>
>>>>   Mike Heffner <mike@librato.com>
>>>>   Librato, Inc.
>>>>
>>>
>>
>>
>>
>> --
>>
>>   Mike Heffner <mike@librato.com>
>>   Librato, Inc.
>>
>
>
>
> --
>
>   Mike Heffner <mike@librato.com>
>   Librato, Inc.
>


-- 
Close the World, Open the Net
http://www.linux-wizard.net