cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Wee <peich...@gmail.com>
Subject Re: massive spikes in read latency
Date Tue, 07 Jan 2014 13:15:45 GMT
    /**
     * Verbs it's okay to drop if the request has been queued longer than
the request timeout.  These
     * all correspond to client requests or something triggered by them; we
don't want to
     * drop internal messages like bootstrap or repair notifications.
     */
    public static final EnumSet<Verb> DROPPABLE_VERBS =
EnumSet.of(Verb.BINARY,

 Verb._TRACE,

 Verb.MUTATION,

 Verb.READ_REPAIR,

 Verb.READ,

 Verb.RANGE_SLICE,

 Verb.PAGED_RANGE,

 Verb.REQUEST_RESPONSE);


The short term solution would probably increase the timeout in your yaml
file but i suggest you get the monitoring graphs (ping internode, block io)
ready so it will give better indication which might be the exact problem.

Jason


On Tue, Jan 7, 2014 at 2:30 AM, Blake Eggleston <blake@shift.com> wrote:

> That’s a good point. CPU steal time is very low, but I haven’t observed
> internode ping times during one of the peaks, I’ll have to check that out.
> Another thing I’ve noticed is that cassandra starts dropping read messages
> during the spikes, as reported by tpstats. This indicates that there’s too
> many queries for cassandra to handle. However, as I mentioned earlier, the
> spikes aren’t correlated to an increase in reads.
>
> On Jan 5, 2014, at 3:28 PM, Blake Eggleston <blake@shift.com> wrote:
>
> > Hi,
> >
> > I’ve been having a problem with 3 neighboring nodes in our cluster
> having their read latencies jump up to 9000ms - 18000ms for a few minutes
> (as reported by opscenter), then come back down.
> >
> > We’re running a 6 node cluster, on AWS hi1.4xlarge instances, with
> cassandra reading and writing to 2 raided ssds.
> >
> > I’ve added 2 nodes to the struggling part of the cluster, and aside from
> the latency spikes shifting onto the new nodes, it has had no effect. I
> suspect that a single key that lives on the first stressed node may be
> being read from heavily.
> >
> > The spikes in latency don’t seem to be correlated to an increase in
> reads. The cluster’s workload is usually handling a maximum workload of
> 4200 reads/sec per node, with writes being significantly less, at ~200/sec
> per node. Usually it will be fine with this, with read latencies at around
> 3.5-10 ms/read, but once or twice an hour the latencies on the 3 nodes will
> shoot through the roof.
> >
> > The disks aren’t showing serious use, with read and write rates on the
> ssd volume at around 1350 kBps and 3218 kBps, respectively. Each cassandra
> process is maintaining 1000-1100 open connections. GC logs aren’t showing
> any serious gc pauses.
> >
> > Any ideas on what might be causing this?
> >
> > Thanks,
> >
> > Blake
>
>

Mime
View raw message