cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-3804) upgrade problems from 1.0 to trunk
Date Thu, 16 Feb 2012 14:39:00 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-3804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sylvain Lebresne updated CASSANDRA-3804:
----------------------------------------

    Attachment: node2.log
                node1.log

Tried the two attached patches. It does remove a bunch of the exceptions. For some reason
there is still quite a few EOFExceptions related to truncation, but the test doesn't truncate
at all so I'm not sure were that coming from.  I'm attaching the logs from the nodes for reference
(that's the log after the two patches are applied). node1.log is the 1.1 node and node2.log
is a 1.0 node. The thing that triggers those exception is the creation of a CF on node2 (the
old one).

So it'd be nice to figure out what triggers those exception, but if we're going to patch both
1.1 and 1.0, why not just (or rather in addition) have schema changes check the (known) version
of all other nodes before doing anything and just throw an InvalidRequestException if we know
the schema change will fail?

                
> upgrade problems from 1.0 to trunk
> ----------------------------------
>
>                 Key: CASSANDRA-3804
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3804
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1.0
>         Environment: ubuntu, cluster set up with ccm.
>            Reporter: Tyler Patterson
>            Assignee: Pavel Yaskevich
>             Fix For: 1.1.0
>
>         Attachments: CASSANDRA-3804-1.1.patch, CASSANDRA-3804.patch, node1.log, node2.log
>
>
> A 3-node cluster is on version 0.8.9, 1.0.6, or 1.0.7 and then one and only one node
is taken down, upgraded to trunk, and started again. An rpc timeout exception happens if counter-add
operations are done. It usually takes between 1 and 500 add operations before the failure
occurs. The failure seems to happen sooner if the coordinator node is NOT the one that was
upgraded. Here is the error: 
> {code}
> ======================================================================
> ERROR: counter_upgrade_test.TestCounterUpgrade.counter_upgrade_test
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/usr/lib/pymodules/python2.7/nose/case.py", line 187, in runTest
>     self.test(*self.arg)
>   File "/home/tahooie/cassandra-dtest/counter_upgrade_test.py", line 50, in counter_upgrade_test
>     cursor.execute("UPDATE counters SET row = row+1 where key='a'")
>   File "/usr/local/lib/python2.7/dist-packages/cql/cursor.py", line 96, in execute
>     raise cql.OperationalError("Request did not complete within rpc_timeout.")
> OperationalError: Request did not complete within rpc_timeout.
> {code}
> A script has been added to cassandra-dtest (counter_upgrade_test.py) to demonstrate the
failure. The newest version of CCM is required to run the test. It is available here if it
hasn't yet been pulled: git@github.com:tpatterson/ccm.git

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message