From user-return-64289-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Fri Aug 2 06:46:00 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id D3A14180647 for ; Fri, 2 Aug 2019 08:45:59 +0200 (CEST) Received: (qmail 67578 invoked by uid 500); 2 Aug 2019 06:45:56 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 67567 invoked by uid 99); 2 Aug 2019 06:45:55 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Aug 2019 06:45:55 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 4C982C26B7 for ; Fri, 2 Aug 2019 06:45:55 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.801 X-Spam-Level: * X-Spam-Status: No, score=1.801 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id F-IQD2b7MbC3 for ; Fri, 2 Aug 2019 06:45:52 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a00:1450:4864:20::42e; helo=mail-wr1-x42e.google.com; envelope-from=martinxue@gmail.com; receiver= Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 235DE7D3FB for ; Fri, 2 Aug 2019 06:45:52 +0000 (UTC) Received: by mail-wr1-x42e.google.com with SMTP id n9so76084084wru.0 for ; Thu, 01 Aug 2019 23:45:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=OobTqAsZjCjL0GwVohX3MQEKSFf2F0+Sp43fGEECf9Q=; b=UJCLi3GTymPHb66IJ6bpYikuFMVEU5LnQk9y3uHDzkJMTStG2BmpsD8pvWwvF/BJKN uh83gmv3wpM28VJ535FOXU3Lkf1uFCXbt9QAjAenYMEojSi+yqf9+t+1sDCd41DU+bdm s6+X6uoTx4ygvQdUn35/Xp/XDkQhenAwEH0EX+HIALI6NKLJXnAnTEOzcJKYMFcKqbJY uCVifb+HsSQoHlpjPKkZjE4EZ9VzC8MecYT7CzmsOsK0v5ReckHe6kiZV6oqfsiNqb0x aJ01VDymL96D99HBT3QC6mGsJdRJNUY3l1DurrV0l7LJn7r3uosWYhKnnYUSuIZzJ9r9 QfMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=OobTqAsZjCjL0GwVohX3MQEKSFf2F0+Sp43fGEECf9Q=; b=FC2E6uJnUAkXfAD+8PS0+nzoT9BYUf7OIVBwIduS1BzMIyvIPbmWy7qujnePo36/XE DSu50cjNCVhO5FXYyaIOQFMzahoGPXFJWdG2ubsDpot9XEHnKXW3LMv7wXdODfEpEEbc 2UuaWQZi4kWQPl2cszPO2g+2uzHVU/wdvy+fWIA0soiBRByQ7fQzRY4XpN23DfVYda2N mnDM6sh6OYSSqf/bWG53OlKtMXE7J8PFmTE2FqpftTT1xdHBTKy52BZ+yJf2j3XI7U7u ebzXRjzC62m8EopvKaz/27LBgrXdvSJcAphmjiG/NnT3faxHBeSp7XANpBaCxTJdeDRY NKjg== X-Gm-Message-State: APjAAAVIeVXVIDPs8vwjRgWMAgasDZ3ogYVyYAz46/M2kLPD1eAEM0nf IhO11KCpIgIJZrQKTgzJqytTM7KifsNF+rerKXKmuA== X-Google-Smtp-Source: APXvYqyuJAII0zHU06CXfAEnCAi1RRAVE+PN2N0C8Z1X825m/zYxWDen37duraGQYpfX+i1Vl+D+lYl4ASQ5FvBVWzY= X-Received: by 2002:adf:dbcb:: with SMTP id e11mr29359919wrj.272.1564728351074; Thu, 01 Aug 2019 23:45:51 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Martin Xue Date: Fri, 2 Aug 2019 16:45:13 +1000 Message-ID: Subject: Re: Repair failed and crash the node, how to bring it back? To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary="0000000000008953ea058f1cb5ee" --0000000000008953ea058f1cb5ee Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Alex, Thanks, much appreciated. Regards Martin On Thu, Aug 1, 2019 at 3:34 PM Alexander Dejanovski wrote: > Hi Martin, > > apparently this is the bug you've been hit by on hints : > https://issues.apache.org/jira/browse/CASSANDRA-14080 > It was fixed in 3.0.17. > > You didn't provide the logs from Cassandra at the time of the crash, only > the output of nodetool, so it's hard to say what caused it. You may be hi= t > by this bug: https://issues.apache.org/jira/browse/CASSANDRA-14096 > This is unlikely to happen with Reaper (as mentioned in the description o= f > the ticket) since it will generate smaller Merkle trees as subrange cover= s > less partitions for each repair session. > > So the advice is : upgrade to 3.0.19 (even 3.11.4 IMHO as 3.0 offers less > performance than 3.11) and use Reaper to > handle/schedule repairs. > > Cheers, > > ----------------- > Alexander Dejanovski > France > @alexanderdeja > > Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com > > > On Thu, Aug 1, 2019 at 12:05 AM Martin Xue wrote: > >> Hi Alex, >> >> Thanks for your reply. The disk space was around 80%. The crash happened >> during repair, primary range full repair on 1TB keyspace. >> >> Would that crash again? >> >> Thanks >> Regards >> Martin >> >> On Thu., 1 Aug. 2019, 12:04 am Alexander Dejanovski, < >> alex@thelastpickle.com> wrote: >> >>> It looks like you have a corrupted hint file. >>> Did the node run out of disk space while repair was running? >>> >>> You might want to move the hint files off their current directory and >>> try to restart the node again. >>> Since you'll have lost mutations then, you'll need... to run repair >>> =C2=AF\_(=E3=83=84)_/=C2=AF >>> >>> ----------------- >>> Alexander Dejanovski >>> France >>> @alexanderdeja >>> >>> Consultant >>> Apache Cassandra Consulting >>> http://www.thelastpickle.com >>> >>> >>> On Wed, Jul 31, 2019 at 3:51 PM Martin Xue wrote: >>> >>>> Hi, >>>> >>>> I am running repair on production, started with one of 6 nodes in the >>>> cluster (3 nodes in each of two DC). Cassandra version 3.0.14. >>>> >>>> running: repair -pr --full keyspace on node 1, 1TB data, takes two >>>> days, and crash, >>>> >>>> error shows: >>>> 3202]] finished (progress: 3%) >>>> Exception occurred during clean-up. >>>> java.lang.reflect.UndeclaredThrowableException >>>> Cassandra has shutdown. >>>> error: [2019-07-31 20:19:20,797] JMX connection closed. You should >>>> check server log for repair status of keyspace keyspace_masked (Subseq= uent >>>> keyspaces are not going to be repaired). >>>> -- StackTrace -- >>>> java.io.IOException: [2019-07-31 20:19:20,797] JMX connection closed. >>>> You should check server log for repair status of keyspace keyspace_mas= ked >>>> keyspaces are not going to be repaired). >>>> at >>>> org.apache.cassandra.tools.RepairRunner.handleConnectionFailed(RepairR= unner.java:97) >>>> at >>>> org.apache.cassandra.tools.RepairRunner.handleConnectionClosed(RepairR= unner.java:91) >>>> at >>>> org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListene= r.handleNotification(JMXNotificationProgressListener.java:90) >>>> at >>>> javax.management.NotificationBroadcasterSupport.handleNotification(Not= ificationBroadcasterSupport.java:275) >>>> at >>>> javax.management.NotificationBroadcasterSupport$SendNotifJob.run(Notif= icationBroadcasterSupport.java:352) >>>> at >>>> javax.management.NotificationBroadcasterSupport$1.execute(Notification= BroadcasterSupport.java:337) >>>> at >>>> javax.management.NotificationBroadcasterSupport.sendNotification(Notif= icationBroadcasterSupport.java:248) >>>> at >>>> javax.management.remote.rmi.RMIConnector.sendNotification(RMIConnector= .java:441) >>>> at >>>> javax.management.remote.rmi.RMIConnector.close(RMIConnector.java:533) >>>> at >>>> javax.management.remote.rmi.RMIConnector.access$1300(RMIConnector.java= :121) >>>> at >>>> javax.management.remote.rmi.RMIConnector$RMIClientCommunicatorAdmin.go= tIOException(RMIConnector.java:1534) >>>> at >>>> javax.management.remote.rmi.RMIConnector$RMINotifClient.fetchNotifs(RM= IConnector.java:1352) >>>> at >>>> com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.fetchOne= Notif(ClientNotifForwarder.java:655) >>>> at >>>> com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.fetchNot= ifs(ClientNotifForwarder.java:607) >>>> at >>>> com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(Cl= ientNotifForwarder.java:471) >>>> at >>>> com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(Clie= ntNotifForwarder.java:452) >>>> at >>>> com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor$1.run(= ClientNotifForwarder.java:108) >>>> >>>> system.log shows >>>> INFO [Service Thread] 2019-07-31 20:19:08,579 GCInspector.java:284 - >>>> G1 Young Generation GC in 2915ms. G1 Eden Space: 914358272 -> 0; G1 O= ld >>>> Gen: 19043999248 -> 20219035248; >>>> INFO [Service Thread] 2019-07-31 20:19:08,579 StatusLogger.java:52 - >>>> Pool Name Active Pending Completed Blocked= All >>>> Time Blocked >>>> INFO [Service Thread] 2019-07-31 20:19:08,584 StatusLogger.java:56 - >>>> MutationStage 19 15 9578177305 0 >>>> 0 >>>> >>>> INFO [Service Thread] 2019-07-31 20:19:08,585 StatusLogger.java:56 - >>>> ViewMutationStage 0 0 0 0 >>>> 0 >>>> >>>> INFO [Service Thread] 2019-07-31 20:19:08,585 StatusLogger.java:56 - >>>> ReadStage 10 0 219357504 0 >>>> 0 >>>> >>>> INFO [Service Thread] 2019-07-31 20:19:08,585 StatusLogger.java:56 - >>>> RequestResponseStage 1 0 625174550 0 >>>> 0 >>>> >>>> INFO [Service Thread] 2019-07-31 20:19:08,585 StatusLogger.java:56 - >>>> ReadRepairStage 0 0 2544772 0 >>>> 0 >>>> >>>> INFO [Service Thread] 2019-07-31 20:19:08,585 StatusLogger.java:56 - >>>> CounterMutationStage 0 0 0 0 >>>> 0 >>>> >>>> INFO [Service Thread] 2019-07-31 20:19:08,585 StatusLogger.java:56 - >>>> MiscStage 0 0 0 0 >>>> 0 >>>> >>>> INFO [Service Thread] 2019-07-31 20:19:08,586 StatusLogger.java:56 - >>>> CompactionExecutor 1 1 9515493 0 >>>> 0 >>>> >>>> >>>> When I restart the cassandra, it still failed, >>>> now the error in system.log shows: >>>> >>>> INFO [main] 2019-07-31 21:35:02,044 StorageService.java:575 - >>>> Cassandra version: 3.0.14 >>>> INFO [main] 2019-07-31 21:35:02,044 StorageService.java:576 - Thrift >>>> API version: 20.1.0 >>>> INFO [main] 2019-07-31 21:35:02,044 StorageService.java:577 - CQL >>>> supported versions: 3.4.0 (default: 3.4.0) >>>> ERROR [main] 2019-07-31 21:35:02,075 CassandraDaemon.java:710 - >>>> Exception encountered during startup >>>> org.apache.cassandra.io.FSReadError: java.io.EOFException >>>> at >>>> org.apache.cassandra.hints.HintsDescriptor.readFromFile(HintsDescripto= r.java:142) >>>> ~[apache-cassandra-3.0.14.jar:3.0.14] >>>> at >>>> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:1= 93) >>>> ~[na:1.8.0_171] >>>> at >>>> java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:1= 75) >>>> ~[na:1.8.0_171] >>>> at java.util.Iterator.forEachRemaining(Iterator.java:116) >>>> ~[na:1.8.0_171] >>>> at >>>> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterato= rs.java:1801) >>>> ~[na:1.8.0_171] >>>> at >>>> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) >>>> ~[na:1.8.0_171] >>>> at >>>> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.jav= a:471) >>>> ~[na:1.8.0_171] >>>> at >>>> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:= 708) >>>> ~[na:1.8.0_171] >>>> at >>>> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) >>>> ~[na:1.8.0_171] >>>> at >>>> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) >>>> ~[na:1.8.0_171] >>>> at >>>> org.apache.cassandra.hints.HintsCatalog.load(HintsCatalog.java:65) >>>> ~[apache-cassandra-3.0.14.jar:3.0.14] >>>> at >>>> org.apache.cassandra.hints.HintsService.(HintsService.java:88) >>>> ~[apache-cassandra-3.0.14.jar:3.0.14] >>>> at >>>> org.apache.cassandra.hints.HintsService.(HintsService.java:63) >>>> ~[apache-cassandra-3.0.14.jar:3.0.14] >>>> at >>>> org.apache.cassandra.service.StorageProxy.(StorageProxy.java:1= 21) >>>> ~[apache-cassandra-3.0.14.jar:3.0.14] >>>> at java.lang.Class.forName0(Native Method) ~[na:1.8.0_171] >>>> at java.lang.Class.forName(Class.java:264) ~[na:1.8.0_171] >>>> at >>>> org.apache.cassandra.service.StorageService.initServer(StorageService.= java:585) >>>> ~[apache-cassandra-3.0.14.jar:3.0.14] >>>> at >>>> org.apache.cassandra.service.StorageService.initServer(StorageService.= java:570) >>>> ~[apache-cassandra-3.0.14.jar:3.0.14] >>>> at >>>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.jav= a:346) >>>> [apache-cassandra-3.0.14.jar:3.0.14] >>>> at >>>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.= java:569) >>>> [apache-cassandra-3.0.14.jar:3.0.14] >>>> at >>>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java= :697) >>>> [apache-cassandra-3.0.14.jar:3.0.14] >>>> Caused by: java.io.EOFException: null >>>> at java.io.RandomAccessFile.readInt(RandomAccessFile.java:803) >>>> ~[na:1.8.0_171] >>>> at >>>> org.apache.cassandra.hints.HintsDescriptor.deserialize(HintsDescriptor= .java:237) >>>> ~[apache-cassandra-3.0.14.jar:3.0.14] >>>> at >>>> org.apache.cassandra.hints.HintsDescriptor.readFromFile(HintsDescripto= r.java:138) >>>> ~[apache-cassandra-3.0.14.jar:3.0.14] >>>> ... 20 common frames omitted >>>> >>>> >>>> Can anyone help how to bring back the node again? >>>> >>>> Also there are (anti-compaction after repair) running on other nodes, >>>> shall I stopped them as well, if so how to do it (nodetool stop >>>> compaction?)? >>>> >>>> Any suggestions will be much appreciated. >>>> >>>> Thanks >>>> Regards >>>> Martin >>>> >>>> >>>> --0000000000008953ea058f1cb5ee Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Alex,

Thanks, much appreciated.=C2= =A0

Regards
Martin

<= /div>
O= n Thu, Aug 1, 2019 at 3:34 PM Alexander Dejanovski <alex@thelastpickle.com> wrote:
Hi Martin,
apparently this is the bug you've been hit by on hints = :=C2=A0https://issues.apache.org/jira/browse/CASSANDRA-14080
It was fixed in 3.0.17.

You didn't pr= ovide the logs from Cassandra at the time of the crash, only the output of = nodetool, so it's hard to say what caused it. You may be hit by this bu= g:=C2=A0https://issues.apache.org/jira/browse/CASSANDRA-14096<= /div>
This is unlikely to happen with Reaper (as mentioned in the descr= iption of the ticket) since it will generate smaller Merkle trees as subran= ge covers less partitions for each repair session.

So the advice is : upgrade to 3.0.19 (even 3.11.4 IMHO as 3.0 offers less = performance than 3.11) and use Reaper to handle/schedule repairs.

Cheers,

-----------------
Alexander Dejanovski
France
@alexanderd= eja

Consultant
Apache Cassandra Consulting


On Thu, Aug 1, 2019 at 12:05 AM Martin Xue = <martinxue@gmai= l.com> wrote:
Hi Alex,

Thanks for your reply. The disk space was around 80%. The crash happened = during repair, primary range full repair on 1TB keyspace.

Would that crash again?

Thanks
Regards
Martin

On Thu., 1 Aug. 2019, 12:04 am Alexander De= janovski, <a= lex@thelastpickle.com> wrote:
It looks like you have a corrupted hi= nt file.
Did the node run out of disk space while repair was running?

You might want to move the hint files off their cur= rent directory and try to restart the node again.
Since you'l= l have lost mutations then, you'll need... to run repair =C2=AF\_(=E3= =83=84)_/=C2=AF

----------------= -
Alexander Dejanovski
France
@alexanderdeja

Consultant
= Apache Cassandra Consulting


On Wed, Jul 31, 2019 at 3:51 PM Martin Xue <m= artinxue@gmail.com> wrote:
Hi,

I am running repa= ir on production, started with one of 6 nodes in the cluster (3 nodes in ea= ch of two DC). Cassandra version 3.0.14.

running: = repair -pr --full keyspace on node 1, 1TB data, takes two days, and crash,<= /div>

error shows:
3202]] finished (progress: = 3%)
Exception occurred during clean-up. java.lang.reflect.UndeclaredThro= wableException
Cassandra has shutdown.
error: [2019-07-31 20:19:20,79= 7] JMX connection closed. You should check server log for repair status of = keyspace keyspace_masked (Subsequent keyspaces are not going to be repaired= ).
-- StackTrace --
java.io.IOException: [2019-07-31 20:19:20,797] JM= X connection closed. You should check server log for repair status of keysp= ace keyspace_masked keyspaces are not going to be repaired).
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 at org.apache.cassandra.tools.RepairRunner.handleConnecti= onFailed(RepairRunner.java:97)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache= .cassandra.tools.RepairRunner.handleConnectionClosed(RepairRunner.java:91)<= br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.cassandra.utils.progress.jmx.J= MXNotificationProgressListener.handleNotification(JMXNotificationProgressLi= stener.java:90)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at javax.management.Notifica= tionBroadcasterSupport.handleNotification(NotificationBroadcasterSupport.ja= va:275)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at javax.management.NotificationBroa= dcasterSupport$SendNotifJob.run(NotificationBroadcasterSupport.java:352)=C2=A0 =C2=A0 =C2=A0 =C2=A0 at javax.management.NotificationBroadcasterSup= port$1.execute(NotificationBroadcasterSupport.java:337)
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 at javax.management.NotificationBroadcasterSupport.sendNotifi= cation(NotificationBroadcasterSupport.java:248)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at javax.management.remote.rmi.RMIConnector.sendNotification(RMIConnect= or.java:441)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at javax.management.remote.rmi.= RMIConnector.close(RMIConnector.java:533)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at= javax.management.remote.rmi.RMIConnector.access$1300(RMIConnector.java:121= )
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at javax.management.remote.rmi.RMIConnecto= r$RMIClientCommunicatorAdmin.gotIOException(RMIConnector.java:1534)
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 at javax.management.remote.rmi.RMIConnector$RMINot= ifClient.fetchNotifs(RMIConnector.java:1352)
=C2=A0 =C2=A0 =C2=A0 =C2=A0= at com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.fetchOneN= otif(ClientNotifForwarder.java:655)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.s= un.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.fetchNotifs(Client= NotifForwarder.java:607)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.sun.jmx.remo= te.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.ja= va:471)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at com.sun.jmx.remote.internal.Clien= tNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:452)
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 at com.sun.jmx.remote.internal.ClientNotifForwarder$Li= nearExecutor$1.run(ClientNotifForwarder.java:108)

<= div>system.log shows
INFO =C2=A0[Service Thread] 2019-07-31 20:19= :08,579 GCInspector.java:284 - G1 Young Generation GC in 2915ms.=C2=A0 G1 E= den Space: 914358272 -> 0; G1 Old Gen: 19043999248 -> 20219035248;INFO =C2=A0[Service Thread] 2019-07-31 20:19:08,579 StatusLogger.java:52 -= Pool Name =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0Active =C2=A0 Pending =C2=A0 =C2=A0 =C2=A0Completed =C2=A0 Blocked = =C2=A0All Time Blocked
INFO =C2=A0[Service Thread] 2019-07-31 20:19:08,5= 84 StatusLogger.java:56 - MutationStage =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A019 =C2=A0 =C2=A0 =C2=A0 =C2=A015 =C2=A0 = =C2=A0 9578177305 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 0

INFO =C2=A0[Service Thread] 2019-07-3= 1 20:19:08,585 StatusLogger.java:56 - ViewMutationStage =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0

INFO =C2= =A0[Service Thread] 2019-07-31 20:19:08,585 StatusLogger.java:56 - ReadStag= e =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A010 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0219357504 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 0

INFO =C2=A0[Service Thread] 2019-07-31 20:19:08,585 Sta= tusLogger.java:56 - RequestResponseStage =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A01 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0625174550= =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 0

INFO =C2=A0[Service Thread] 2019-07-31 20:19:08,585 = StatusLogger.java:56 - ReadRepairStage =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 = =C2=A0 =C2=A02544772 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0

INFO =C2=A0[Service Thread] 2019-0= 7-31 20:19:08,585 StatusLogger.java:56 - CounterMutationStage =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0

INFO =C2= =A0[Service Thread] 2019-07-31 20:19:08,585 StatusLogger.java:56 - MiscStag= e =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A00 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0

INFO =C2=A0[Service Thread] 2019-0= 7-31 20:19:08,586 StatusLogger.java:56 - CompactionExecutor =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 = =C2=A0 =C2=A0 =C2=A0 =C2=A09515493 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0


When I restart the cassandra, it still failed,
n= ow the error in system.log shows:

INFO =C2=A0[main= ] 2019-07-31 21:35:02,044 StorageService.java:575 - Cassandra version: 3.0.= 14
INFO =C2=A0[main] 2019-07-31 21:35:02,044 StorageService.java:576 - T= hrift API version: 20.1.0
INFO =C2=A0[main] 2019-07-31 21:35:02,044 Stor= ageService.java:577 - CQL supported versions: 3.4.0 (default: 3.4.0)
ERR= OR [main] 2019-07-31 21:35:02,075 CassandraDaemon.java:710 - Exception enco= untered during startup
org.apache.cassandra.io.FSReadError: java.io.EOFE= xception
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.cassandra.hints.Hints= Descriptor.readFromFile(HintsDescriptor.java:142) ~[apache-cassandra-3.0.14= .jar:3.0.14]
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.stream.ReferencePi= peline$3$1.accept(ReferencePipeline.java:193) ~[na:1.8.0_171]
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 at java.util.stream.ReferencePipeline$2$1.accept(Referenc= ePipeline.java:175) ~[na:1.8.0_171]
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.= util.Iterator.forEachRemaining(Iterator.java:116) ~[na:1.8.0_171]
=C2=A0= =C2=A0 =C2=A0 =C2=A0 at java.util.Spliterators$IteratorSpliterator.forEach= Remaining(Spliterators.java:1801) ~[na:1.8.0_171]
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:= 481) ~[na:1.8.0_171]
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.stream.Abs= tractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) ~[na:1.8.0_171]=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.stream.ReduceOps$ReduceOp.evaluat= eSequential(ReduceOps.java:708) ~[na:1.8.0_171]
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234= ) ~[na:1.8.0_171]
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.util.stream.Refere= ncePipeline.collect(ReferencePipeline.java:499) ~[na:1.8.0_171]
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 at org.apache.cassandra.hints.HintsCatalog.load(HintsC= atalog.java:65) ~[apache-cassandra-3.0.14.jar:3.0.14]
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 at org.apache.cassandra.hints.HintsService.<init>(HintsSer= vice.java:88) ~[apache-cassandra-3.0.14.jar:3.0.14]
=C2=A0 =C2=A0 =C2=A0= =C2=A0 at org.apache.cassandra.hints.HintsService.<clinit>(HintsServ= ice.java:63) ~[apache-cassandra-3.0.14.jar:3.0.14]
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 at org.apache.cassandra.service.StorageProxy.<clinit>(StorageP= roxy.java:121) ~[apache-cassandra-3.0.14.jar:3.0.14]
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 at java.lang.Class.forName0(Native Method) ~[na:1.8.0_171]
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.lang.Class.forName(Class.java:264) ~[na= :1.8.0_171]
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.cassandra.service.= StorageService.initServer(StorageService.java:585) ~[apache-cassandra-3.0.1= 4.jar:3.0.14]
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.cassandra.servic= e.StorageService.initServer(StorageService.java:570) ~[apache-cassandra-3.0= .14.jar:3.0.14]
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.cassandra.serv= ice.CassandraDaemon.setup(CassandraDaemon.java:346) [apache-cassandra-3.0.1= 4.jar:3.0.14]
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.cassandra.servic= e.CassandraDaemon.activate(CassandraDaemon.java:569) [apache-cassandra-3.0.= 14.jar:3.0.14]
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.cassandra.servi= ce.CassandraDaemon.main(CassandraDaemon.java:697) [apache-cassandra-3.0.14.= jar:3.0.14]
Caused by: java.io.EOFException: null
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 at java.io.RandomAccessFile.readInt(RandomAccessFile.java:803) ~= [na:1.8.0_171]
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.cassandra.hints= .HintsDescriptor.deserialize(HintsDescriptor.java:237) ~[apache-cassandra-3= .0.14.jar:3.0.14]
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.cassandra.hi= nts.HintsDescriptor.readFromFile(HintsDescriptor.java:138) ~[apache-cassand= ra-3.0.14.jar:3.0.14]
=C2=A0 =C2=A0 =C2=A0 =C2=A0 ... 20 common frames o= mitted


Can anyone help how to b= ring back the node again?

Also there are (anti-com= paction after repair) running on other nodes, shall I stopped them as well,= if so how to do it (nodetool stop compaction?)?

A= ny suggestions will be much appreciated.

Thank= s
Regards
Martin


--0000000000008953ea058f1cb5ee--