cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shalom sagges <shalomsag...@gmail.com>
Subject Re: AbstractLocalAwareExecutorService Exception During Upgrade
Date Wed, 19 Jun 2019 09:37:47 GMT
Hi Again,

Trying to push this up as I wasn't able to find the root cause of this
issue.
Perhaps I need to upgrade to 3.0 first?
Will be happy to get some ideas.

Opened https://issues.apache.org/jira/browse/CASSANDRA-15172 with more
details.

Thanks!

On Thu, Jun 6, 2019 at 5:31 AM Jonathan Koppenhofer <jon@koppedomain.com>
wrote:

> Not sure about why repair is running, but we are also seeing the same
> merkle tree issue in a mixed version cluster in which we have intentionally
> started a repair against 2 upgraded DCs. We are currently researching, and
> can post back if we find the issue, but also would appreciate if someone
> has a suggestion. We have also run a local repair in an upgraded DC in this
> same mixed version cluster without issue.
>
> We are going 2.1.x to 3.0.x... and yes, we know you are not supposed to
> run repairs in mixed version clusters, so don't do it :) this is kind of a
> special circumstances where other things have gone wrong.
>
> Thanks
>
> On Wed, Jun 5, 2019, 5:23 PM shalom sagges <shalomsagges@gmail.com> wrote:
>
>> If anyone has any idea on what might cause this issue, it'd be great.
>>
>> I don't understand what could trigger this exception.
>> But what I really can't understand is why repairs started to run suddenly
>> :-\
>> There's no cron job running, no active repair process, no Validation
>> compactions, Reaper is turned off....  I see repair running only in the
>> logs.
>>
>> Thanks!
>>
>>
>> On Wed, Jun 5, 2019 at 2:32 PM shalom sagges <shalomsagges@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> I'm having a bad situation where after upgrading 2 nodes (binaries only)
>>> from 2.1.21 to 3.11.4 I'm getting a lot of warnings as follows:
>>>
>>> AbstractLocalAwareExecutorService.java:167 - Uncaught exception on
>>> thread Thread[ReadStage-5,5,main]: {}
>>> java.lang.ArrayIndexOutOfBoundsException: null
>>>
>>>
>>> I also see errors on repairs but no repair is running at all. I verified
>>> this with ps -ef command and nodetool compactionstats. The error I see is:
>>> Failed creating a merkle tree for [repair
>>> #a95498f0-8783-11e9-b065-81cdbc6bee08 on system_auth/users, []], /
>>> 1.2.3.4 (see log for details)
>>>
>>> I saw repair errors on data tables as well.
>>> nodetool status shows all are UN and nodetool describecluster shows two
>>> schema versions as expected.
>>>
>>>
>>> After the warnings appeared, clients started to get timed out read/write
>>> queries.
>>> Restarting the 2 nodes solved the clients' connection issues, but the
>>> warnings are still being generated in the logs.
>>>
>>> Did anyone encounter such an issue and knows what this means?
>>>
>>> Thanks!
>>>
>>>

Mime
View raw message