cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Coli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-4446) nodetool drain sometimes doesn't mark commitlog fully flushed
Date Fri, 18 Jan 2013 23:46:13 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557767#comment-13557767
] 

Robert Coli commented on CASSANDRA-4446:
----------------------------------------

How to reproduce it, from the multiple reports :

1) Drain and stop cluster with counters on 1.0.x
2) Start same cluster on 1.1.x
3) Notice commitlog replay of the counter columnfamily and that your counters have over-counted

Attached is a log from the latest reporter, CASSANDRA-4446--1.0.12_to_1.1.8.txt. It shows
the following.

1) Drain starts and completes on 1.0.12
2) Cluster then starts on 1.1.8, and replays the commit log
3) As part of commitlog replay, it flushes various CFs including titan3/RMEntityCount/, which
is a counter columnfamily; machine has 4gb of heap and the flush is while thrift is down and
the node has not jumped state to normal, so it seems reasonable to conjecture this flush is
part of commitlog replay
4) It then logs "10698 replayed mutations", which adds further support to the idea that these
Counts are part of replay
5) Operator then noticed a significant percentage of records had overcounted in this columnfamily
                
> nodetool drain sometimes doesn't mark commitlog fully flushed
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-4446
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4446
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core, Tools
>         Environment: ubuntu 10.04 64bit
> Linux HOSTNAME 2.6.32-345-ec2 #48-Ubuntu SMP Wed May 2 19:29:55 UTC 2012 x86_64 GNU/Linux
> sun JVM
> cassandra 1.0.10 installed from apache deb
>            Reporter: Robert Coli
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.2.1
>
>         Attachments: 4446.txt, cassandra.1.0.10.replaying.log.after.exception.during.drain.txt,
CASSANDRA-4446--1.0.12_to_1.1.8.txt
>
>
> I recently wiped a customer's QA cluster. I drained each node and verified that they
were drained. When I restarted the nodes, I saw the commitlog replay create a memtable and
then flush it. I have attached a sanitized log snippet from a representative node at the time.

> It appears to show the following :
> 1) Drain begins
> 2) Drain triggers flush
> 3) Flush triggers compaction
> 4) StorageService logs DRAINED message
> 5) compaction thread excepts
> 6) on restart, same CF creates a memtable
> 7) and then flushes it [1]
> The columnfamily involved in the replay in 7) is the CF for which the compaction thread
excepted in 5). This seems to suggest a timing issue whereby the exception in 5) prevents
the flush in 3) from marking all the segments flushed, causing them to replay after restart.
> In case it might be relevant, I did an online change of compaction strategy from Leveled
to SizeTiered during the uptime period preceding this drain.
> [1] Isn't commitlog replay not supposed to automatically trigger a flush in modern cassandra?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message