From user-return-63995-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Thu Jun 6 02:31:12 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id EAF2C18065D for ; Thu, 6 Jun 2019 04:31:11 +0200 (CEST) Received: (qmail 66024 invoked by uid 500); 6 Jun 2019 02:31:08 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 66014 invoked by uid 99); 6 Jun 2019 02:31:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Jun 2019 02:31:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 5DFF3C2CFB for ; Thu, 6 Jun 2019 02:31:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.004 X-Spam-Level: **** X-Spam-Status: No, score=4.004 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, KAM_BADIPHTTP=2, NORMAL_HTTP_TO_IP=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=koppedomain-com.20150623.gappssmtp.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id puAzNTCqsISO for ; Thu, 6 Jun 2019 02:31:04 +0000 (UTC) Received: from mail-io1-f46.google.com (mail-io1-f46.google.com [209.85.166.46]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 0B6655F1B9 for ; Thu, 6 Jun 2019 02:31:04 +0000 (UTC) Received: by mail-io1-f46.google.com with SMTP id h6so636321ioh.3 for ; Wed, 05 Jun 2019 19:31:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=koppedomain-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=U3i5RO1/AHbS5sUPt5S4MpX8W1Y/j1ALi0dA5xPRZ+k=; b=xMlC4XIraFLhZJ9Cojd/eykeFuxsUE/p+fXxi8gQBIkg1ECunXmE9dv7piAkZMu2Fx 45HWF/Hr6hnwTpHibg7DxBC/2T2M7v3rUHqUuq/QYIkjiSWvQjJqtxZuVzXJZ/a7D3yR gEAIvM+IkbJba+1yRx85ksh7hJQ8tuzMbK2EIjKYG3fHWMVPm5Y1Zin/mpZpFNIifr6t NrH1YhXVxIoC0CgbcPPs4jHtCRZ7fBBN+/1+1N617AgfTbev023QkfTMy3t28GvHU3Tq k/OPfCtGze/Ft3dScuiyOA3duy4sUEFdojpE2lTphDg/+N/HaeqOBSXryJvk8Ds0u1xY niZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=U3i5RO1/AHbS5sUPt5S4MpX8W1Y/j1ALi0dA5xPRZ+k=; b=naoMQIvZbsJVpF68VtsJAGpAMDWYZbqdz5oA0Q76GdwFwGKQpGkw6HUcy1udfG3KZ/ fme242QZs4wvnqEJAQF61juNqN3xJ4sIJGr7eDDek2l5F7rYthcoNQEr0X+s2KXbhmaM 3boR1PVQ+Ko7D8nINbwh0FDEsR2p0AvxSUPC1CuxpY1ViTTQi2gi2ouzc35kxccaGj3P ra9yVLl7+Nvo60TE5j+HO+abKBjVWUrbzJaRuNb2j7+M1000fJw0jDQZMGCsRjyo6fXr gLV5Wu8Siz4PLIawYzbUpBF5Rg5lnpPzpTX6YyFhR+XMYzNcOyQ2NGvwMHKMYXqzu9jM 9AJA== X-Gm-Message-State: APjAAAXbAdFxx1eltQvOeIhuOL64mlaOA8XbLRPOVlDG4HaIlT0NYxy5 QK+lTIouYtcOHk2rdJ4lFw26PuiLY/+J/ShN2B2Kzkux X-Google-Smtp-Source: APXvYqxN4JM0ZKl1Rw6IismrrnED5u7p0zf8DtEqJBbHrJ/MRjCxcWDfjhjM2AyjpZDJB0ikqqO2iNCdNdsfVXeZcnc= X-Received: by 2002:a6b:f90f:: with SMTP id j15mr2079472iog.43.1559788262467; Wed, 05 Jun 2019 19:31:02 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Jonathan Koppenhofer Date: Wed, 5 Jun 2019 22:30:50 -0400 Message-ID: Subject: Re: AbstractLocalAwareExecutorService Exception During Upgrade To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary="0000000000004f42ad058a9e8148" --0000000000004f42ad058a9e8148 Content-Type: text/plain; charset="UTF-8" Not sure about why repair is running, but we are also seeing the same merkle tree issue in a mixed version cluster in which we have intentionally started a repair against 2 upgraded DCs. We are currently researching, and can post back if we find the issue, but also would appreciate if someone has a suggestion. We have also run a local repair in an upgraded DC in this same mixed version cluster without issue. We are going 2.1.x to 3.0.x... and yes, we know you are not supposed to run repairs in mixed version clusters, so don't do it :) this is kind of a special circumstances where other things have gone wrong. Thanks On Wed, Jun 5, 2019, 5:23 PM shalom sagges wrote: > If anyone has any idea on what might cause this issue, it'd be great. > > I don't understand what could trigger this exception. > But what I really can't understand is why repairs started to run suddenly > :-\ > There's no cron job running, no active repair process, no Validation > compactions, Reaper is turned off.... I see repair running only in the > logs. > > Thanks! > > > On Wed, Jun 5, 2019 at 2:32 PM shalom sagges > wrote: > >> Hi All, >> >> I'm having a bad situation where after upgrading 2 nodes (binaries only) >> from 2.1.21 to 3.11.4 I'm getting a lot of warnings as follows: >> >> AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread >> Thread[ReadStage-5,5,main]: {} >> java.lang.ArrayIndexOutOfBoundsException: null >> >> >> I also see errors on repairs but no repair is running at all. I verified >> this with ps -ef command and nodetool compactionstats. The error I see is: >> Failed creating a merkle tree for [repair >> #a95498f0-8783-11e9-b065-81cdbc6bee08 on system_auth/users, []], /1.2.3.4 >> (see log for details) >> >> I saw repair errors on data tables as well. >> nodetool status shows all are UN and nodetool describecluster shows two >> schema versions as expected. >> >> >> After the warnings appeared, clients started to get timed out read/write >> queries. >> Restarting the 2 nodes solved the clients' connection issues, but the >> warnings are still being generated in the logs. >> >> Did anyone encounter such an issue and knows what this means? >> >> Thanks! >> >> --0000000000004f42ad058a9e8148 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Not sure about why repair is running, but we are also see= ing the same merkle tree issue in a mixed version cluster in which we have = intentionally started a repair against 2 upgraded DCs. We are currently res= earching, and can post back if we find the issue, but also would appreciate= if someone has a suggestion. We have also run a local repair in an upgrade= d DC in this same mixed version cluster without issue.
We are going 2.1.x to 3.0.x... and yes, we know yo= u are not supposed to run repairs in mixed version clusters, so don't d= o it :) this is kind of a special circumstances where other things have gon= e wrong.

Thanks

On = Wed, Jun 5, 2019, 5:23 PM shalom sagges <shalomsagges@gmail.com> wrote:
If anyone has any idea on what might ca= use this issue, it'd be great.

I don'= t understand what could trigger this exception.
But what I r= eally can't understand is why repairs started to run suddenly :-\
=
There's no cron job running, no active repair process, no Validati= on compactions, Reaper is turned off....=C2=A0 I see repair running only in= the logs.

Thanks!


On= Wed, Jun 5, 2019 at 2:32 PM shalom sagges <shalomsagges@gmail.com> wrote:
Hi All,

I'm having a bad s= ituation where after upgrading 2 nodes (binaries only) from 2.1.21 to 3.11.= 4 I'm getting a lot of warnings as follows:

Ab= stractLocalAwareExecutorService.java:167 - Uncaught exception on thread Thr= ead[ReadStage-5,5,main]: {}
java.lang.ArrayIndexOutOfBoundsException: nu= ll


I also see errors on repairs but= no repair is running at all. I verified this with ps -ef command and nodet= ool compactionstats. The error I see is:

I saw repair = errors on data tables as well.
nodetool status shows all are= UN and nodetool describecluster shows two schema versions as expected.


After the warnings appeared, clie= nts started to get timed out read/write queries.
Restarting = the 2 nodes solved the clients' connection issues, but the warnings are= still being generated in the logs.

Did anyon= e encounter such an issue and knows what this means?

Thanks!

--0000000000004f42ad058a9e8148--