Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 224227FD1 for ; Wed, 9 Nov 2011 13:58:16 +0000 (UTC) Received: (qmail 36440 invoked by uid 500); 9 Nov 2011 13:58:16 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 36416 invoked by uid 500); 9 Nov 2011 13:58:16 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 36408 invoked by uid 99); 9 Nov 2011 13:58:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Nov 2011 13:58:15 +0000 X-ASF-Spam-Status: No, hits=-2001.2 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Nov 2011 13:58:12 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id A460444570 for ; Wed, 9 Nov 2011 13:57:51 +0000 (UTC) Date: Wed, 9 Nov 2011 13:57:51 +0000 (UTC) From: =?utf-8?Q?Jonas_Borgstr=C3=B6m_=28Commented=29_=28JIRA=29?= To: commits@cassandra.apache.org Message-ID: <1073006050.14365.1320847071674.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1510324170.7567.1320693411652.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (CASSANDRA-3466) Hinted handoff not working after rolling upgrade from 0.8.7 to 1.0.2 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CASSANDRA-3466?page=3Dcom.atlas= sian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D= 13147054#comment-13147054 ]=20 Jonas Borgstr=C3=B6m commented on CASSANDRA-3466: -------------------------------------------- > I haven't been able to reproduce the assertion errors, but I did find wha= t is preventing hint delivery in some cases Brandon, Did you verify that removing those lines of code actually fixes hi= nt delivery?=20 Instead of changing the code I just did a quick experiment with "nodetool f= lush" on the node holding the hints and then restarting the other node but = that was not enough to trigger hints delivery: {code} Node1 notices that node2 is backup up INFO 14:41:50,752 Node /127.0.0.2 has restarted, now UP INFO 14:41:50,752 InetAddress /127.0.0.2 is now UP INFO 14:41:50,753 Node /127.0.0.2 state jump to normal But no hints are delivered... nodetool flush is used to make sure hints hit the disk on node1: INFO 14:42:32,675 Enqueuing flush of Memtable-Versions@1503666327(83/103 = serialized/live bytes, 3 ops) INFO 14:42:32,675 Writing Memtable-Versions@1503666327(83/103 serialized/= live bytes, 3 ops) INFO 14:42:32,681 Completed flushing /tmp/node1/data/data/system/Versions= -h-1-Data.db (247 bytes) INFO 14:42:32,682 Enqueuing flush of Memtable-HintsColumnFamily@737188401= (177/221 serialized/live bytes, 1 ops) INFO 14:42:32,682 Writing Memtable-HintsColumnFamily@737188401(177/221 se= rialized/live bytes, 1 ops) INFO 14:42:32,688 Completed flushing /tmp/node1/data/data/system/HintsCol= umnFamily-h-1-Data.db (277 bytes) INFO 14:42:32,691 Enqueuing flush of Memtable-bar@1831941861(17/21 serial= ized/live bytes, 1 ops) INFO 14:42:32,691 Writing Memtable-bar@1831941861(17/21 serialized/live b= ytes, 1 ops) INFO 14:42:32,694 Completed flushing /tmp/node1/data/data/foo/bar-h-1-Dat= a.db (68 bytes) Node2 is restarted once more to check if this will trigger hints delivery: INFO 14:42:54,650 InetAddress /127.0.0.2 is now dead. INFO 14:43:02,628 Node /127.0.0.2 has restarted, now UP INFO 14:43:02,629 InetAddress /127.0.0.2 is now UP INFO 14:43:02,629 Node /127.0.0.2 state jump to normal Still nothing... Restarting node 1 will deliver the hints within a few sec= onds though... {code} Regarding reproducing the assertion error it's a bit tricky. But after lett= ing my two node test cluster performing hints delivery for each other a few= times I was able to reproduce it once more. Is there anything special you = would like me to test? =20 > Hinted handoff not working after rolling upgrade from 0.8.7 to 1.0.2 > -------------------------------------------------------------------- > > Key: CASSANDRA-3466 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3466 > Project: Cassandra > Issue Type: Bug > Affects Versions: 1.0.0 > Reporter: Jonas Borgstr=C3=B6m > Assignee: Brandon Williams > Labels: hintedhandoff > Fix For: 1.0.3 > > > While testing rolling upgrades from 0.8.7 to 1.0.2 on a test cluster I've= noticed that hinted hand-off didn't always work properly. Hints generated = on an upgraded node does not seem to be delivered to other newly upgraded n= odes once they rejoin the ring. They only way I've found to get a node to d= eliver its hints is to restart it. > Here's some steps to reproduce this issue: > 1. Install cassandra 0.8.7 on node1 and node2 using default settings. > 2. Create keyspace foo with {replication_factor: 2}. Create column family= bar > 3. Shutdown node2=20 > 4. Insert data into bar and verify that HintsColumnFamily on node2 contai= ns hints > 5. Start node2 and verify that hinted handoff is performed and HintsColum= nFamily becomes empty again. > 6. Upgrade and restart node1 > 7. Shutdown node2=20 > 8. Insert data into bar and verify that HintsColumnFamily on node2 contai= ns hints > 9. Upgrade and start node2 > 10. Notice that hinted handoff is *not* performed when "node2" comes back= . (Only if node1 is restarted) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs: https://issues.apache.org/jira/secure/ContactAdministrators!default.jsp= a For more information on JIRA, see: http://www.atlassian.com/software/jira