cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Anguenot <jul...@anguenot.org>
Subject Re: Cluster not working after upgrade from 2.1.12 to 3.5.0
Date Tue, 21 Jun 2016 19:17:28 GMT
AFAICT, the issue does not seem to be driver related as the duplicates
where showing up both using the cqlsh and Java driver. In addition,
the sstabledump output contained the actual duplicates. (see Jira
issue)

On Tue, Jun 21, 2016 at 12:04 PM, Oskar Kjellin <oskar.kjellin@gmail.com> wrote:
> Did you see similar issues when querying using a driver? Because we get no results in
the driver what so ever
>
> Sent from my iPhone
>
>> On 21 juni 2016, at 18:50, Julien Anguenot <julien@anguenot.org> wrote:
>>
>> See my comments on the issue: I had to truncate and reinsert data in
>> these corrupted tables.
>>
>> AFAIK, there is no evidence that UDTs are responsible of this bad behavior.
>>
>>> On Tue, Jun 21, 2016 at 11:45 AM, Oskar Kjellin <oskar.kjellin@gmail.com>
wrote:
>>> Yea I saw that one. We're not using UDT in the affected tables tho.
>>>
>>> Did you resolve it?
>>>
>>> Sent from my iPhone
>>>
>>>> On 21 juni 2016, at 18:27, Julien Anguenot <julien@anguenot.org> wrote:
>>>>
>>>> I have experienced similar duplicate primary keys behavior with couple
>>>> of tables after upgrading from 2.2.x to 3.0.x.
>>>>
>>>> See comments on the Jira issue I opened at the time over there:
>>>> https://issues.apache.org/jira/browse/CASSANDRA-11887
>>>>
>>>>
>>>>> On Tue, Jun 21, 2016 at 10:47 AM, Oskar Kjellin <oskar.kjellin@gmail.com>
wrote:
>>>>> Hi,
>>>>>
>>>>> We've done this upgrade in both dev and stage before and we did not see
>>>>> similar issues.
>>>>> After upgrading production today we have a lot issues tho.
>>>>>
>>>>> The main issue is that the Datastax client quite often does not get the
data
>>>>> (even though it's the same query). I see similar flakyness by simply
running
>>>>> cqlsh, although it does return it returns broken data.
>>>>>
>>>>> We are running a 3 node cluster with RF 3.
>>>>>
>>>>> I have this table
>>>>>
>>>>> CREATE TABLE keyspace.table (
>>>>>
>>>>> a text,
>>>>>
>>>>>   b text,
>>>>>
>>>>>   c text,
>>>>>
>>>>>   d list<text>,
>>>>>
>>>>>   e text,
>>>>>
>>>>>   f timestamp,
>>>>>
>>>>>   g list<text>,
>>>>>
>>>>>   h timestamp,
>>>>>
>>>>>   PRIMARY KEY (a, b, c)
>>>>>
>>>>> )
>>>>>
>>>>>
>>>>> Every other time I query (not exactly every other time, but random) I
get:
>>>>>
>>>>>
>>>>> SELECT * from table where a = 'xxx' and b = 'xxx'
>>>>>
>>>>> a             | b | c                                 | d | e | f
>>>>> | g            | h
>>>>>
>>>>> ---------------------+--------------+-----------------------------------------------+------------------+------------+---------------------------------+-----------------------+---------------------------------
>>>>>
>>>>> xxx |          xxx | ccc |             null |       null | 2089-11-30
>>>>> 23:00:00.000000+0000 | ['fff'] | 2014-12-31 23:00:00.000000+0000
>>>>>
>>>>> xxx |          xxx |                           ddd |             null
|
>>>>> null | 2099-01-01 00:00:00.000000+0000 | ['fff'] | 2016-06-17
>>>>> 13:29:36.000000+0000
>>>>>
>>>>>
>>>>> Which is the expected output.
>>>>>
>>>>>
>>>>> But I also get:
>>>>>
>>>>> a             | b | c                                 | d | e | f
>>>>> | g            | h
>>>>>
>>>>> ---------------------+--------------+-----------------------------------------------+------------------+------------+---------------------------------+-----------------------+---------------------------------
>>>>>
>>>>> xxx |          xxx | ccc |             null |       null |
>>>>> null |                  null |                            null
>>>>>
>>>>> xxx |          xxx | ccc |             null |       null | 2089-11-30
>>>>> 23:00:00.000000+0000 | ['fff'] |                            null
>>>>>
>>>>> xxx |          xxx | ccc |             null |       null |
>>>>> null |                  null | 2014-12-31 23:00:00.000000+0000
>>>>>
>>>>> xxx |          xxx |                           ddd |             null
|
>>>>> null |                            null |                  null |
>>>>> null
>>>>>
>>>>> xxx |          xxx |                           ddd |             null
|
>>>>> null | 2099-01-01 00:00:00.000000+0000 | ['fff'] |
>>>>> null
>>>>>
>>>>> xxx |          xxx |                           ddd |             null
|
>>>>> null |                            null |                  null | 2016-06-17
>>>>> 13:29:36.000000+0000
>>>>>
>>>>>
>>>>> Notice that the same PK is returned 3 times. With different parts of
the
>>>>> data. I believe this is what's currently killing our production environment.
>>>>>
>>>>>
>>>>> I'm running upgradesstables as of this moment, but it's not finished
yet. I
>>>>> started a repair before but nothing happened. The upgradesstables finished
>>>>> now on 2 out of 3 nodes, but production is still down :/
>>>>>
>>>>>
>>>>> We also see these in the logs, over and over again:
>>>>>
>>>>> DEBUG [ReadRepairStage:4] 2016-06-21 15:44:01,119 ReadCallback.java:235
-
>>>>> Digest mismatch:
>>>>>
>>>>> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
>>>>> DecoratedKey(-1566729966326640413, 336b35356c49537731797a4a5f64627a797236)
>>>>> (b3dcfcbeed6676eae7ff88cc1bd251fb vs 6e7e9225871374d68a7cdb54ae70726d)
>>>>>
>>>>> at
>>>>> org.apache.cassandra.service.DigestResolver.resolve(DigestResolver.java:85)
>>>>> ~[apache-cassandra-3.5.0.jar:3.5.0]
>>>>>
>>>>> at
>>>>> org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:226)
>>>>> ~[apache-cassandra-3.5.0.jar:3.5.0]
>>>>>
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>>> [na:1.8.0_72]
>>>>>
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>>> [na:1.8.0_72]
>>>>>
>>>>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72]
>>>>>
>>>>>
>>>>> Any help is much appreciated
>>>>
>>>>
>>>>
>>>> --
>>>> Julien Anguenot (@anguenot)
>>
>>
>>
>> --
>> Julien Anguenot (@anguenot)



-- 
Julien Anguenot (@anguenot)

Mime
View raw message