incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jasdeep Hundal <dsjas...@gmail.com>
Subject Re: Waiting on read repair?
Date Tue, 19 Mar 2013 16:12:10 GMT
No secondary indexes and I don't think it's IO. Commit log/data are on
separate SSD's (and C* is the only user of these disks), and the total
amount of data being written to the cluster is much less than the
SSD's are capable of writing (including replication I think we're at
about 10% of what the disks are capable of) (not sure how much of an
effect compaction has).

Using jstack, we found some switchLock contention related to a flush
being blocked from joining the memtable flush queue. This was blocking
row mutations to that memtable. Increasing the queue size/the number
of flush writers has seemed to help the issue significantly (probably
worth mentioning that we have a lot more cf's than the 4 in that log
excerpt):
Pool Name                    Active   Pending      Completed   Blocked
 All time blocked
FlushWriter                       9           9                 29336
       0                39

One thing I don't have a feel for is what the effects of increasing
the number of flushwriters would be, would you happen to have any
guidance on that?

Thanks,
Jasdeep


On Tue, Mar 19, 2013 at 1:16 AM, aaron morton <aaron@thelastpickle.com> wrote:
> I checked the flushwriter thread pool stats and saw this:
> Pool Name                    Active   Pending      Completed   Blocked
> All time blocked
> FlushWriter                       1         5                    86183
>        1             17582
>
> That's not good.
> Is the IO system over utilised ?
>
> In my setup, I have one extremely high traffic column family that is
>
> Any secondary indexes? If so see the comments for memtable_flush_queue_size
> in the yaml file.
>
> From the following log output, it looks like the cf with the large
> data load is blocking the flush of the other cf's.
>
> Not sure that is the case.
> In the log messages the commit log is rotating and needs to free up an old
> log segment (on the OptionalTasks thread) so it is flushing all of the CF's
> that have something written to the log segment.
>
>
> This could be that IO is not keeping up, it's unlikely to be a switch lock
> issue if you only have a 4 CF's. Also have you checked for GC messages in
> the C* logs ?
>
> Cheers
>
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 19/03/2013, at 12:25 PM, Jasdeep Hundal <dsjas297@gmail.com> wrote:
>
> Thanks for the info, got a couple of follow up questions, and just as
> a note, this is on Cassandra 1.2.0.
>
> I checked the flushwriter thread pool stats and saw this:
> Pool Name                    Active   Pending      Completed   Blocked
> All time blocked
> FlushWriter                       1         5                    86183
>        1             17582
>
> Also, memtable_flush_queue_size is set to 4, and
> memtable_flush_writers is set to 1
>
> In my setup, I have one extremely high traffic column family that is
> flushing lots of data at once (occasionally hitting hundreds of
> megabytes), and several smaller cf's for which flushes involve only a
> few bytes of data.
>
> From the following log output, it looks like the cf with the large
> data load is blocking the flush of the other cf's. Would increasing
> memtable_flush_queue_size (I've got plenty of memory) and
> memtable_flush_writers allow the smaller cf flushes to return faster?
> Or given that I see the smaller cf's being flushed when not much has
> been written to them, should I try reducing
> commit_log_segment_size_in_mb.
>
> 168026:2013-03-18 17:53:41,938 INFO  [OptionalTasks:1]
> org.apache.cassandra.db.ColumnFamilyStore.switchMemtable
> ColumnFamilyStore.java:647) - Enqueuing flush of
> Memtable-data@2098591494(458518528/458518528 serialized/live bytes,
> 111732 ops)
> 168028:2013-03-18 17:53:47,064 INFO  [OptionalTasks:1]
> org.apache.cassandra.db.ColumnFamilyStore.switchMemtable
> (ColumnFamilyStore.java:647) - Enqueuing flush of
> Memtable-metadata@252512204(2295/2295 serialized/live bytes, 64 ops)
> 168029:2013-03-18 17:53:47,065 INFO  [OptionalTasks:1]
> org.apache.cassandra.db.ColumnFamilyStore.switchMemtable
> (ColumnFamilyStore.java:647) - Enqueuing flush of
> Memtable-counters@544926156(363/363 serialized/live bytes, 12 ops)
> 168030:2013-03-18 17:53:47,066 INFO  [OptionalTasks:1]
> org.apache.cassandra.db.ColumnFamilyStore.switchMemtable
> (ColumnFamilyStore.java:647) - Enqueuing flush of
> Memtable-container_counters@1703633084(430/430 serialized/live bytes,
> 83 ops)
> 168032:2013-03-18 17:53:51,950 INFO  [FlushWriter:3]
> org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents
> (Memtable.java:458) - Completed flushing
> /mnt/test/jasdeep/data/jasdeep-data-ia-720454-Data.db (391890130
> bytes) for commitlog position ReplayPosition(segmentId=1363628611044,
> position=21069295)
> 168050:2013-03-18 17:53:55,948 INFO  [FlushWriter:3]
> org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents
> (Memtable.java:458) - Completed flushing
> /mnt/test/jasdeep/metadata/jasdeep-metadata-ia-1280-Data.db (833
> bytes) for commitlog position ReplayPosition(segmentId=1363628611047,
> position=4213859)
> 168052:2013-03-18 17:53:55,966 INFO  [FlushWriter:3]
> org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents
> (Memtable.java:458) - Completed flushing
> /mnt/test/jasdeep/counters/jasdeep-counters-ia-1204-Data.db (342
> bytes) for commitlog position ReplayPosition(segmentId=1363628611047,
> position=4213859)
>
> Thanks again,
> Jasdeep
>
>
>
> On Mon, Mar 18, 2013 at 10:24 AM, aaron morton <aaron@thelastpickle.com>
> wrote:
>
> 1. With a ConsistencyLevel of quorum, does
> FBUtilities.waitForFutures() wait for read repair to complete before
> returning?
>
> No
> That's just a utility method.
> Nothing on the read path waits for Read Repair, and controlled by
> read_repair_chance CF property, it's all async to the client request.
> There is no CL, the messages are sent to individual nodes.
>
> 2. When read repair applies a mutation, it needs to obtain a lock for
> the associated memtable.
>
> What lock are you referring to?
> When Read Repair (the RowDataResolver) wants to send a mutation it uses the
> MessageServer. On the write path there is a server wide RW lock call the
> sync lock.
>
> I've seen readrepair spend a few seconds stalling in
> org.apache.cassandra.db.Table.apply).
>
> This  could be contention around the sync lock, look for blocked tasks in
> the flush writer thread pool.
>
> I did a talk on cassandra internals at Apache Con 3 weeks ago, not sure when
> the video is going to be up but here are the slides
> http://www.slideshare.net/aaronmorton/apachecon-nafeb2013
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 16/03/2013, at 12:21 PM, Jasdeep Hundal <dsjas297@gmail.com> wrote:
>
> I've got a couple of questions related issues I'm encountering using
> Cassandra under a heavy write load:
>
> 1. With a ConsistencyLevel of quorum, does
> FBUtilities.waitForFutures() wait for read repair to complete before
> returning?
> 2. When read repair applies a mutation, it needs to obtain a lock for
> the associated memtable. Does compaction obtain this same lock? (I'm
> asking because I've seen readrepair spend a few seconds stalling in
> org.apache.cassandra.db.Table.apply).
>
> Thanks,
> Jasdeep
>
>
>

Mime
View raw message