From commits-return-215630-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Wed Oct 31 21:42:06 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 7A25318065D for ; Wed, 31 Oct 2018 21:42:05 +0100 (CET) Received: (qmail 15495 invoked by uid 500); 31 Oct 2018 20:42:04 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 15484 invoked by uid 99); 31 Oct 2018 20:42:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 31 Oct 2018 20:42:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 259E8194F82 for ; Wed, 31 Oct 2018 20:42:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.501 X-Spam-Level: X-Spam-Status: No, score=-109.501 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 2-o2nblH6i3E for ; Wed, 31 Oct 2018 20:42:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id B27465F562 for ; Wed, 31 Oct 2018 20:42:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 1A5A3E261D for ; Wed, 31 Oct 2018 20:42:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 742C62776C for ; Wed, 31 Oct 2018 20:42:00 +0000 (UTC) Date: Wed, 31 Oct 2018 20:42:00 +0000 (UTC) From: "Dinesh Joshi (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Assigned] (CASSANDRA-14831) Nodetool repair hangs with java.net.SocketException: End-of-stream reached MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-14831?page=3Dcom.atl= assian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Joshi reassigned CASSANDRA-14831: ---------------------------------------- Assignee: Dinesh Joshi > Nodetool repair hangs with java.net.SocketException: End-of-stream reache= d > -------------------------------------------------------------------------= - > > Key: CASSANDRA-14831 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1483= 1 > Project: Cassandra > Issue Type: Bug > Components: Repair > Reporter: Tania S Engel > Assignee: Dinesh Joshi > Priority: Major > Fix For: 3.11.1 > > Attachments: Cassandra - 14831 Logs.mht > > > Using Cassandra 3.11.1. > Ran >nodetool repair on a small 3 node cluster=C2=A0 from = node 3eef. Node 9160 and 3f5e experienced a stream failure.=C2=A0 > *NODE 9160:*=C2=A0 > ERROR [STREAM-IN-/fd70:616e:6761:6561:ae1f:6bff:fe12:3f5e:7000] 2018-10-1= 6 01:45:00,400 StreamSession.java:593 - [Stream #103fe070-d0e5-11e8-a993-59= 29a1c131b4] Streaming error occurred on session with peer fd70:616e:6761:65= 61:ae1f:6bff:fe12:3f5e > *java.net.SocketException: End-of-stream reached* > at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(Stre= amMessage.java:71) ~[apache-cassandra-3.11.1.jar:3.11.1] > at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandle= r.run(ConnectionHandler.java:311) ~[apache-cassandra-3.11.1.jar:3.11.1] > at java.lang.Thread.run(Thread.java:748)=C2=A0[na:1.8.0_152] > =C2=A0 > *NODE 3f5e:* > ERROR [STREAM-IN-/fd70:616e:6761:6561:ec4:7aff:fece:9160:59676] 2018-10-1= 6 01:45:09,474 StreamSession.java:593 - [Stream #103ef610-d0e5-11e8-a993-59= 29a1c131b4] Streaming error occurred on session with peer fd70:616e:6761:65= 61:ec4:7aff:fece:9160 > java.io.IOException: An existing connection was forcibly closed by the re= mote host > at sun.nio.ch.SocketDispatcher.read0(Native Method) ~[na:1.8.0_152] > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43) ~[na:1.8.0_= 152] > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.8.0_152= ] > at sun.nio.ch.IOUtil.read(IOUtil.java:197) ~[na:1.8.0_152] > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[na:1.8= .0_152] > at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:206= ) ~[na:1.8.0_152] > at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) ~[na:1= .8.0_152] > at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:= 385) ~[na:1.8.0_152] > at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(Stre= amMessage.java:56) ~[apache-cassandra-3.11.1.jar:3.11.1] > at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandle= r.run(ConnectionHandler.java:311) ~[apache-cassandra-3.11.1.jar:3.11.1] > at java.lang.Thread.run(Thread.java:748)=C2=A0[na:1.8.0_152] > =C2=A0 > *NODE 3EEF:* > ERROR [RepairJobTask:14] 2018-10-16 01:45:00,457 RepairSession.java:281 -= [repair #f2ab3eb0-d0e4-11e8-9926-bf64f35712c1] Session completed with the = following error > org.apache.cassandra.exceptions.RepairException: [repair #f2ab3eb0-d0e4-1= 1e8-9926-bf64f35712c1 on logs/{color:#333333}XXXXXX{color}, [(-827192583862= 5565988,-8266397600493941101], (2290821710735817606,2299380749828706426] = =E2=80=A6(-8701313305140908434,-8686533141993948378]]] Sync failed between = /fd70:616e:6761:6561:ec4:7aff:fece:9160 and /fd70:616e:6761:6561:ae1f:6bff:= fe12:3f5e > at org.apache.cassandra.repair.RemoteSyncTask.syncComplete(RemoteSyncTask= .java:67) ~[apache-cassandra-3.11.1.jar:3.11.1] > at org.apache.cassandra.repair.RepairSession.syncComplete(RepairSession.j= ava:202) ~[apache-cassandra-3.11.1.jar:3.11.1] > at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveR= epairService.java:495) ~[apache-cassandra-3.11.1.jar:3.11.1] > at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMess= ageVerbHandler.java:162) ~[apache-cassandra-3.11.1.jar:3.11.1] > at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.j= ava:66) ~[apache-cassandra-3.11.1.jar:3.11.1] > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511= ) ~[na:1.8.0_152] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_15= 2] > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j= ava:1149) [na:1.8.0_152] > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.= java:624) [na:1.8.0_152] > at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalD= eallocator$0(NamedThreadFactory.java:81) [apache-cassandra-3.11.1.jar:3.11.= 1] > at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_152] > =C2=A0 > ERROR [RepairJobTask:14] 2018-10-16 01:45:00,459 RepairRunnable.java:276 = - Repair session f2ab3eb0-d0e4-11e8-9926-bf64f35712c1 for range [(-82719258= 38625565988,-8266397600493941101],=E2=80=A6(-6146831664074703724,-611710723= 6121156255], (4842256698807887573,4848113042863615717], (-87013133051409084= 34,-8686533141993948378]] failed with error [repair #f2ab3eb0-d0e4-11e8-992= 6-bf64f35712c1 on logs/auditsearchlog,=E2=80=A6(-8701313305140908434,-86865= 33141993948378]]] Sync failed between /fd70:616e:6761:6561:ec4:7aff:fece:91= 60 and /fd70:616e:6761:6561:ae1f:6bff:fe12:3f5e > org.apache.cassandra.exceptions.RepairException: [repair #f2ab3eb0-d0e4-1= 1e8-9926-bf64f35712c1 on logs/auditsearchlog, [(-8271925838625565988,-82663= 97600493941101], =E2=80=A6(4842256698807887573,4848113042863615717], (-8701= 313305140908434,-8686533141993948378]]] Sync failed between /fd70:616e:6761= :6561:ec4:7aff:fece:9160 and /fd70:616e:6761:6561:ae1f:6bff:fe12:3f5e > at org.apache.cassandra.repair.RemoteSyncTask.syncComplete(RemoteSyncTask= .java:67) ~[apache-cassandra-3.11.1.jar:3.11.1] > at org.apache.cassandra.repair.RepairSession.syncComplete(RepairSession.j= ava:202) ~[apache-cassandra-3.11.1.jar:3.11.1] > at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveR= epairService.java:495) ~[apache-cassandra-3.11.1.jar:3.11.1] > at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMess= ageVerbHandler.java:162) ~[apache-cassandra-3.11.1.jar:3.11.1] > at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.j= ava:66) ~[apache-cassandra-3.11.1.jar:3.11.1] > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511= ) ~[na:1.8.0_152] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_15= 2] > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j= ava:1149) [na:1.8.0_152] > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.= java:624) [na:1.8.0_152] > at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalD= eallocator$0(NamedThreadFactory.java:81) [apache-cassandra-3.11.1.jar:3.11.= 1] > at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_152] > =C2=A0 > *NODETOOL OUTPUT: shows the failure but then never returns.*=C2=A0 > =C2=A0 > [2018-10-16 01:43:57,310] Starting repair command #8 (f26bc4b0-d0e4-11e8-= 9926-bf64f35712c1), repairing keyspace logs with repair options (parallelis= m: parallel, primary range: false, incremental: true, job threads: 1, Colum= nFamilies: [], dataCenters: [], hosts: [], # of ranges: 768, pull repair: f= alse) > [2018-10-16 01:45:00,462] Repair session f2ab3eb0-d0e4-11e8-9926-bf64f357= 12c1 for range =E2=80=A6 > (4842256698807887573,4848113042863615717], (-8701313305140908434,-8686533= 141993948378]] failed with error [repair #f2ab3eb0-d0e4-11e8-9926-bf64f3571= 2c1 on logs/XXXXXX, > [(-8271925838625565988,-8266397600493941101], (2290821710735817606,229938= 0749828706426], =E2=80=A6 > (4842256698807887573,4848113042863615717], (-8701313305140908434,-8686533= 141993948378]]] Sync failed between /fd70:616e:6761:6561:ec4:7aff:fece:9160= and /fd70:616e:6761:6561:ae1f:6bff:fe12:3f5e (progress: 0%) > =C2=A0 > The streaming does continue between the 3 nodes. See the attached partial= logs from all 3 nodes. Then it stops. We never see that the repair command= finished. Then about 15 hours later, we run=C2=A0 >nodetool repair logs ag= ain. It fails.=C2=A0 This time the error indicates there is an active repai= r session.=C2=A0 The only thing that seemed to get us out of this state was= reboot of all the nodes. > =C2=A0 > *ERROR [ValidationExecutor:27] 2018-10-16 17:14:39,241 ActiveRepairServic= e.java:558 - Cannot start multiple repair sessions over the same sstables* > ERROR [ValidationExecutor:27] 2018-10-16 17:14:39,241 Validator.java:268 = - Failed creating a merkle tree for [repair #da436780-d166-11e8-9926-bf64f3= 5712c1 on logs/YYYYY, [(-8271925838625565988,-8266397600493941101], ... > /fd70:616e:6761:6561:ae1f:6bff:fe12:3ee4 (see log for details) > ERROR [ValidationExecutor:27] 2018-10-16 17:14:39,244 CassandraDaemon.jav= a:228 - Exception in thread Thread[ValidationExecutor:27,1,main] > java.lang.RuntimeException: Cannot start multiple repair sessions over th= e same sstables > =C2=A0at org.apache.cassandra.service.ActiveRepairService$ParentRepairSes= sion.markSSTablesRepairing(ActiveRepairService.java:559) ~[apache-cassandra= -3.11.1.jar:3.11.1] > =C2=A0at org.apache.cassandra.db.compaction.CompactionManager.getSSTables= ToValidate(CompactionManager.java:1446) ~[apache-cassandra-3.11.1.jar:3.11.= 1] > =C2=A0at org.apache.cassandra.db.compaction.CompactionManager.doValidatio= nCompaction(CompactionManager.java:1348) ~[apache-cassandra-3.11.1.jar:3.11= .1] > =C2=A0at org.apache.cassandra.db.compaction.CompactionManager.access$700(= CompactionManager.java:86) ~[apache-cassandra-3.11.1.jar:3.11.1] > =C2=A0at org.apache.cassandra.db.compaction.CompactionManager$13.call(Com= pactionManager.java:942) ~[apache-cassandra-3.11.1.jar:3.11.1] > =C2=A0at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.= 8.0_152] > =C2=A0at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExec= utor.java:1149) ~[na:1.8.0_152] > =C2=A0at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExe= cutor.java:624) [na:1.8.0_152] > =C2=A0at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$thread= LocalDeallocator$0(NamedThreadFactory.java:81) [apache-cassandra-3.11.1.jar= :3.11.1] > =C2=A0at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_152] -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org For additional commands, e-mail: commits-help@cassandra.apache.org