Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 48AB011223 for ; Fri, 8 Aug 2014 20:45:54 +0000 (UTC) Received: (qmail 13854 invoked by uid 500); 8 Aug 2014 20:45:52 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 13830 invoked by uid 500); 8 Aug 2014 20:45:51 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 13820 invoked by uid 99); 8 Aug 2014 20:45:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Aug 2014 20:45:51 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of graham@vast.com designates 209.85.212.176 as permitted sender) Received: from [209.85.212.176] (HELO mail-wi0-f176.google.com) (209.85.212.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Aug 2014 20:45:24 +0000 Received: by mail-wi0-f176.google.com with SMTP id bs8so1618776wib.9 for ; Fri, 08 Aug 2014 13:45:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:content-type:subject:message-id:date:to :mime-version; bh=/13yvdnyHcuH4nxCiG2tiKM3pgwqS/CWyxfyZW5nTvQ=; b=OqGoCF70it2FtQfb1z0FWVC31/7j73QLF6SoTmbSZOfnJeCRa2TGT6nyND92PtdJYs sI3qCGtJmzMppJag2aHJrsYNFAzU8/mu0elkvrc2uBAZPf8LQTEJ3JhM2mHvNNeP4mCQ 2cJP3lpd0kdtw6dqqYLhcp9q23YOHQ1iF3eRqEzZoM7c/hhawIuEAkD9s7MMOmigmnTG y5tKEyp9hMo4PNiXGeTCeDkZepiM+eHeu5UGQzi8zZnfi3jnkYj6CCcwg1Z9vgEEb5cE N71AP81nwsaeI9zqMRZGYBU6cy7FEPS2L7CgkUoKDMN1JbeS38cJTOI6jYUkOM+6qMuA LhxQ== X-Gm-Message-State: ALoCoQlNXNLdOb646bqX8V048DWsWoveZgjIoCsNz7T4fIHruOth7G9nzd+p8pihfTO+GoMwfhpg X-Received: by 10.194.236.35 with SMTP id ur3mr5726195wjc.127.1407530722668; Fri, 08 Aug 2014 13:45:22 -0700 (PDT) Received: from [192.168.1.112] (cpe-70-113-52-246.austin.res.rr.com. [70.113.52.246]) by mx.google.com with ESMTPSA id hp6sm10848264wib.23.2014.08.08.13.45.18 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 08 Aug 2014 13:45:20 -0700 (PDT) From: graham sanderson Content-Type: multipart/signed; boundary="Apple-Mail=_75DC1C64-C271-4CF9-B983-FEE4B3B1C098"; protocol="application/pkcs7-signature"; micalg=sha1 Subject: Strange slow schema agreement on 2.0.9 ... anyone seen this? Message-Id: <1D597B68-715C-4D10-950E-83FBB8925850@vast.com> Date: Fri, 8 Aug 2014 15:45:14 -0500 To: user@cassandra.apache.org Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) X-Mailer: Apple Mail (2.1878.6) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_75DC1C64-C271-4CF9-B983-FEE4B3B1C098 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 We recently upgraded C* from 2.0.5 to 2.0.9 We have some data that is partitioned in tables created periodically = (once a day). This morning, this automated process timed out because the = schema did not reach agreement quickly enough after we created a new = empty table. I was able to reproduce this manually via CQLSH. when I created the = table, and ran a nodetool describecluster, it showed 3 nodes on the old = schema and 3 nodes on the new schema instantly (or as quick as I could = run the nodetool describecluster). It took almost exactly a minute for = the other nodes to switch. The nodes weren=92t busy, machines were healthy network was healthy, = JVMs were healthy - nodetool status, gossipinfo and OpsCenter all looked = happy. We never saw this issue in beta on 2.0.9 or anywhere on 2.0.5, = and yesterday on 2.0.9 after the upgrade it worked correctly. The only clue I have is that for this case, the nodes which were slow to = update called DefsTables.mergeSchema from InternalResponseStage not = MigrationStage (which is what it is called on as I test it now). Looking at the logs, these InternalResponseStage happened eerily close = (within a second) to exactly a minute. Having discovered nothing else wrong, I restarted one of the =93slow=94 = nodes, and the problem went away (for that node). So now the cluster has = been rolling restarted, and is proceeding fine. Anyways, I will dig a little deeper as to why (when all nodes thing each = other are up) the migration verb might not get executed (there were no = errors in any logs)=85 mostly wondering if this rings a bell with anyone= --Apple-Mail=_75DC1C64-C271-4CF9-B983-FEE4B3B1C098 Content-Disposition: attachment; filename=smime.p7s Content-Type: application/pkcs7-signature; name=smime.p7s Content-Transfer-Encoding: base64 MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIICuzCCArcw ggIgAgIBTDANBgkqhkiG9w0BAQUFADCBojELMAkGA1UEBhMCVVMxCzAJBgNVBAgTAk9SMREwDwYD VQQHEwhQb3J0bGFuZDEWMBQGA1UEChMNT21uaS1FeHBsb3JlcjEWMBQGA1UECxMNSVQgRGVwYXJ0 bWVudDEbMBkGA1UEAxMSd3d3LmNvcm5lcmNhc2UuY29tMSYwJAYJKoZIhvcNAQkBFhdibG9ja291 dEBjb3JuZXJjYXNlLmNvbTAeFw0xMTA0MDYxNjE0MzFaFw0yMTA0MDMxNjE0MzFaMIGjMQswCQYD VQQGEwJVUzETMBEGA1UECBMKQ2FsaWZvcm5pYTEWMBQGA1UEBxMNU2FuIEZyYW5jaXNjbzEWMBQG A1UEChMNVmFzdC5jb20gSW5jLjEUMBIGA1UECxMLRW5naW5lZXJpbmcxGTAXBgNVBAMTEEdyYWhh bSBTYW5kZXJzb24xHjAcBgkqhkiG9w0BCQEWD2dyYWhhbUB2YXN0LmNvbTCBnzANBgkqhkiG9w0B AQEFAAOBjQAwgYkCgYEAm4K/W/0VdaOiS6tC1G8tSCAw989XCsJXxVPiny/hND6T0jVv4vP0JRiO vNzH6uoINoKQfgUKa+GCqILdY7Jdx61/WKqxltFTu5D0H8sFFNIKgf9cd3yU6t2susKrxaDXRCul pmcJ3AFg4xuG3ZUZt+XTYhBebQfjwgGQh3/pkQUCAwEAATANBgkqhkiG9w0BAQUFAAOBgQCKW+hQ JqNkPRht5fl8FHku80BLAH9ezEJtZJ6EU9fcK9jNPkAJgSEgPXQ++jE+4iYI2nIb/h5RILUxd1Ht m/yZkNRUVCg0+0Qj6aMT/hfOT0kdP8/9OnbmIp2T6qvNN2rAGU58tt3cbuT2j3LMTS2VOGykK4He iNYYqr+K6sPDHTGCAy0wggMpAgEBMIGpMIGiMQswCQYDVQQGEwJVUzELMAkGA1UECBMCT1IxETAP BgNVBAcTCFBvcnRsYW5kMRYwFAYDVQQKEw1PbW5pLUV4cGxvcmVyMRYwFAYDVQQLEw1JVCBEZXBh cnRtZW50MRswGQYDVQQDExJ3d3cuY29ybmVyY2FzZS5jb20xJjAkBgkqhkiG9w0BCQEWF2Jsb2Nr b3V0QGNvcm5lcmNhc2UuY29tAgIBTDAJBgUrDgMCGgUAoIIB2TAYBgkqhkiG9w0BCQMxCwYJKoZI hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNDA4MDgyMDQ1MTRaMCMGCSqGSIb3DQEJBDEWBBTb5zi1 2/rcbqzoTsr/EUclLplOXzCBugYJKwYBBAGCNxAEMYGsMIGpMIGiMQswCQYDVQQGEwJVUzELMAkG A1UECBMCT1IxETAPBgNVBAcTCFBvcnRsYW5kMRYwFAYDVQQKEw1PbW5pLUV4cGxvcmVyMRYwFAYD VQQLEw1JVCBEZXBhcnRtZW50MRswGQYDVQQDExJ3d3cuY29ybmVyY2FzZS5jb20xJjAkBgkqhkiG 9w0BCQEWF2Jsb2Nrb3V0QGNvcm5lcmNhc2UuY29tAgIBTDCBvAYLKoZIhvcNAQkQAgsxgayggakw gaIxCzAJBgNVBAYTAlVTMQswCQYDVQQIEwJPUjERMA8GA1UEBxMIUG9ydGxhbmQxFjAUBgNVBAoT DU9tbmktRXhwbG9yZXIxFjAUBgNVBAsTDUlUIERlcGFydG1lbnQxGzAZBgNVBAMTEnd3dy5jb3Ju ZXJjYXNlLmNvbTEmMCQGCSqGSIb3DQEJARYXYmxvY2tvdXRAY29ybmVyY2FzZS5jb20CAgFMMA0G CSqGSIb3DQEBAQUABIGAN/F2JHXNYWIs4jAr6DMkrDy6lQonlLeuB2gn2YPXAGuE1OvyN4OW8EY8 dQogWYPWApqlL1dA8387ennFT+67eidC/QA20eMXfUQ01Yxve0/1sbq5k1rJ+pdO+DRtrlQz8uxq tPZhBgPKUhxqbhTZAqvOVvmHLcvRUC7rbKs95fYAAAAAAAA= --Apple-Mail=_75DC1C64-C271-4CF9-B983-FEE4B3B1C098--