cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Koppenhofer <...@koppedomain.com>
Subject Re: AbstractLocalAwareExecutorService Exception During Upgrade
Date Thu, 06 Jun 2019 02:30:50 GMT
Not sure about why repair is running, but we are also seeing the same
merkle tree issue in a mixed version cluster in which we have intentionally
started a repair against 2 upgraded DCs. We are currently researching, and
can post back if we find the issue, but also would appreciate if someone
has a suggestion. We have also run a local repair in an upgraded DC in this
same mixed version cluster without issue.

We are going 2.1.x to 3.0.x... and yes, we know you are not supposed to run
repairs in mixed version clusters, so don't do it :) this is kind of a
special circumstances where other things have gone wrong.

Thanks

On Wed, Jun 5, 2019, 5:23 PM shalom sagges <shalomsagges@gmail.com> wrote:

> If anyone has any idea on what might cause this issue, it'd be great.
>
> I don't understand what could trigger this exception.
> But what I really can't understand is why repairs started to run suddenly
> :-\
> There's no cron job running, no active repair process, no Validation
> compactions, Reaper is turned off....  I see repair running only in the
> logs.
>
> Thanks!
>
>
> On Wed, Jun 5, 2019 at 2:32 PM shalom sagges <shalomsagges@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I'm having a bad situation where after upgrading 2 nodes (binaries only)
>> from 2.1.21 to 3.11.4 I'm getting a lot of warnings as follows:
>>
>> AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread
>> Thread[ReadStage-5,5,main]: {}
>> java.lang.ArrayIndexOutOfBoundsException: null
>>
>>
>> I also see errors on repairs but no repair is running at all. I verified
>> this with ps -ef command and nodetool compactionstats. The error I see is:
>> Failed creating a merkle tree for [repair
>> #a95498f0-8783-11e9-b065-81cdbc6bee08 on system_auth/users, []], /1.2.3.4
>> (see log for details)
>>
>> I saw repair errors on data tables as well.
>> nodetool status shows all are UN and nodetool describecluster shows two
>> schema versions as expected.
>>
>>
>> After the warnings appeared, clients started to get timed out read/write
>> queries.
>> Restarting the 2 nodes solved the clients' connection issues, but the
>> warnings are still being generated in the logs.
>>
>> Did anyone encounter such an issue and knows what this means?
>>
>> Thanks!
>>
>>

Mime
View raw message