Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 647CACE26 for ; Tue, 6 Aug 2013 01:26:51 +0000 (UTC) Received: (qmail 85054 invoked by uid 500); 6 Aug 2013 01:26:49 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 85029 invoked by uid 500); 6 Aug 2013 01:26:49 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 85017 invoked by uid 99); 6 Aug 2013 01:26:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Aug 2013 01:26:48 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of kohlisankalp@gmail.com designates 209.85.216.52 as permitted sender) Received: from [209.85.216.52] (HELO mail-qa0-f52.google.com) (209.85.216.52) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Aug 2013 01:26:42 +0000 Received: by mail-qa0-f52.google.com with SMTP id bq6so39069qab.18 for ; Mon, 05 Aug 2013 18:26:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=+ahJthKJCwIz/TuRd1I44EDsqsTlQHWyTLukULTkz+Y=; b=pV6u1EKp33FDQFEN6OS0TXakdjcGfclt/ZbxmoB04/PCUn3I8sVomb0VQYK0u9JBdX uae6csnqYm/Yywj5LVJWUtAXWFSHAWLiah+a1MfImCqDdQNdBNEcTw4FiVD8nINn/qzr tx/L6QRVVVnIW23XxTsdHHzfviR95VShM1FVYKVW9IIHNUSjmTWuS/JkSN9vO3DoAPJi g0AxQFIPegvcyd3iYB+WcsvTPQZzR41Szwu+QXCj4Ap2BnUqDzb2dgxPCfhNVDymEOTS JLMsx4Zw4+D1Dx4gOSH3acDrXFxqG6UY53ehr1ATVqIAUuE7esPqpum2OpGJ1iTYy5w4 qR9A== X-Received: by 10.224.122.194 with SMTP id m2mr318167qar.81.1375752381870; Mon, 05 Aug 2013 18:26:21 -0700 (PDT) MIME-Version: 1.0 Received: by 10.49.131.229 with HTTP; Mon, 5 Aug 2013 18:25:41 -0700 (PDT) In-Reply-To: References: From: sankalp kohli Date: Mon, 5 Aug 2013 18:25:41 -0700 Message-ID: Subject: Re: Unable to bootstrap node To: user@cassandra.apache.org Cc: Don Jackson Content-Type: multipart/alternative; boundary=089e0158b3980556f404e33d502a X-Virus-Checked: Checked by ClamAV on apache.org --089e0158b3980556f404e33d502a Content-Type: text/plain; charset=ISO-8859-1 Let me know if this fixes the problem? On Mon, Aug 5, 2013 at 6:24 PM, sankalp kohli wrote: > So the problem is that when you dropped and recreated the table with the > same name, some how the old CFStore object was not purged. So now there > were two objects which caused same sstable to have 2 SSTableReader object. > > The fix is to find all nodes which is emitting this FileNotFound Exception > and restart them. > > In your case, restart the node which is serving the data and emitting > FileNotFound exception. > > Once this is up, again restart the bootstrapping node with bootstrap > argument. Now it will successfully stream the data. > > > On Mon, Aug 5, 2013 at 6:08 PM, Keith Wright wrote: > >> Yes we likely dropped and recreated tables. If we stop the sending node, what will happen to the bootstrapping node? >> >> sankalp kohli wrote: >> >> >> Hi, >> The problem is that the node sending the stream is hitting this >> FileNotFound exception. You need to restart this node and it should fix the >> problem. >> >> Are you seeing lot of FileNotFoundExceptions? Did you do any schema >> change recently? >> >> Sankalp >> >> >> On Mon, Aug 5, 2013 at 5:39 PM, Keith Wright wrote: >> >>> Hi all, >>> >>> I have been trying to bootstrap a new node into my 7 node 1.2.4 C* >>> cluster with Vnodes RF3 with no luck. It gets close to completing and then >>> the streaming just stalls with streaming at 99% from 1 or 2 nodes. >>> Nodetool netstats shows the items that have yet to stream but the logs on >>> the new node do not show any errors. I tried shutting down then node, >>> clearing all data/commit logs/caches, and re-boot strapping with no luck. >>> The nodes that are hanging sending the data only have the error below but >>> that's related to compactions (see below) although it is one of the files >>> that is waiting to be sent. I tried nodetool scrub on the column family >>> with the missing item but got an error indicating it could not get a hard >>> link. Any ideas? We were able to bootstrap one of the new nodes with no >>> issues but this other one has been a real pain. Note that when the new >>> node is joining the cluster, it does not appear in nodetool status. Is >>> that expected? >>> >>> Thanks all, my next step is to try getting a new IP for this machine, >>> my thought being that the cluster doesn't like me continuing to attempt to >>> bootstrap the node repeatedly each time getting a new host id. >>> >>> [kwright@lxpcas008 ~]$ nodetool netstats | grep >>> rts-40301_feedProducts-ib-1-Data.db >>> rts: >>> /data/1/cassandra/data/rts/40301_feedProducts/rts-40301_feedProducts-ib-1-Data.db >>> sections=73 progress=0/1884669 - 0% >>> >>> ERROR [ReadStage:427] 2013-08-05 23:23:29,294 CassandraDaemon.java >>> (line 174) Exception in thread Thread[ReadStage:427,5,main] >>> java.lang.RuntimeException: java.io.FileNotFoundException: >>> /data/1/cassandra/data/rts/40301_feedProducts/rts-40301_feedProducts-ib-1-Data.db >>> (No such file or directory) >>> at >>> org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:46) >>> at >>> org.apache.cassandra.io.util.CompressedSegmentedFile.createReader(CompressedSegmentedFile.java:57) >>> at >>> org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:41) >>> at >>> org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:976) >>> at >>> org.apache.cassandra.db.columniterator.SSTableNamesIterator.createFileDataInput(SSTableNamesIterator.java:98) >>> at >>> org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:117) >>> at >>> org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNamesIterator.java:64) >>> at >>> org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:81) >>> at >>> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:68) >>> at >>> org.apache.cassandra.db.CollationController.collectTimeOrderedData(CollationController.java:133) >>> at >>> org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65) >>> at >>> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1357) >>> at >>> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1214) >>> at >>> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1126) >>> at org.apache.cassandra.db.Table.getRow(Table.java:347) >>> at >>> org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:64) >>> at >>> org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:44) >>> at >>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:722) >>> Caused by: java.io.FileNotFoundException: >>> /data/1/cassandra/data/rts/40301_feedProducts/rts-40301_feedProducts-ib-1-Data.db >>> (No such file or directory) >>> at java.io.RandomAccessFile.open(Native Method) >>> at java.io.RandomAccessFile.(RandomAccessFile.java:233) >>> at >>> org.apache.cassandra.io.util.RandomAccessReader.(RandomAccessReader.java:67) >>> at >>> org.apache.cassandra.io.compress.CompressedRandomAccessReader.(CompressedRandomAccessReader.java:75) >>> at >>> org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:42) >>> ... 20 more >>> >> >> > --089e0158b3980556f404e33d502a Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Let me know if this fixes the problem?


On Mon, Aug 5, 2013 at 6:24= PM, sankalp kohli <kohlisankalp@gmail.com> wrote:
So the problem is that when= you dropped and recreated the table with the same name, some how the old C= FStore object was not purged. So now there were two objects which caused sa= me sstable to have 2 SSTableReader object.=A0

The fix is to find all nodes which is emitting this FileNotF= ound Exception and restart them.=A0

In your case, = restart the node which is serving the data and emitting FileNotFound except= ion.=A0

Once this is up, again restart the bootstrapping node w= ith bootstrap argument. Now it will successfully stream the data.=A0
<= /div>

On Mon, Aug 5, 2013 at 6:08 PM, Keith Wright <kwright@nanigans.com= > wrote:
Yes=
 we likely dropped and recreated tables.  If we stop the sending node, what=
 will happen to the bootstrapping node?

sankalp kohli <kohlisankalp@gmail.com> wrote:

Hi,
=A0 =A0 The problem is that the node sending the stream is hitting thi= s FileNotFound exception. You need to restart this node and it should fix t= he problem.=A0

Are you seeing lot of FileNotFoundExceptions? Did you do any schema change = recently?

Sankalp


On Mon, Aug 5, 2013 at 5:39 PM, Keith Wright <kwright@nanig= ans.com> wrote:
Hi all,

=A0 =A0I have been trying to bootstrap a new node into my 7 node 1.2.4= C* cluster with Vnodes RF3 with no luck. =A0It gets close to completing an= d then the streaming just stalls with =A0streaming at 99% from 1 or 2 nodes= . =A0Nodetool netstats shows the items that have yet to stream but the logs on the new node do not show any errors. = =A0I tried shutting down then node, clearing all data/commit logs/caches, a= nd re-boot strapping with no luck. =A0The nodes that are hanging sending th= e data only have the error below but that's related to compactions (see below) although it is one of the files that is= waiting to be sent. =A0I tried nodetool scrub on the column family with th= e missing item but got an error indicating it could not get a hard link. = =A0Any ideas? =A0We were able to bootstrap one of the new nodes with no issues but this other one has been a real pai= n. =A0Note that when the new node is joining the cluster, it does not appea= r in nodetool status. =A0Is that expected?

Thanks all, my next step is to try getting a new IP for this machine, = my thought being that the cluster doesn't like me continuing to attempt= to bootstrap the node repeatedly each time getting a new host id.

[kwright@lxpcas008 ~]$ nodetool netstats | grep rts-40301_feedProducts= -ib-1-Data.db
=A0 =A0rts: /data/1/cassandra/data/rts/40301_feedProducts/rts-40301_fe= edProducts-ib-1-Data.db sections=3D73 progress=3D0/1884669 - 0%

ERROR [ReadStage:427] 2013-08-05 23:23:29,294 CassandraDaemon.java (li= ne 174) Exception in thread Thread[ReadStage:427,5,main]
java.lang.RuntimeException: java.io.FileNotFoundException: /data/1/cas= sandra/data/rts/40301_feedProducts/rts-40301_feedProducts-ib-1-Data.db (No = such file or directory)
=A0 =A0 =A0 =A0 at org.apache.cassandra.io.compress.CompressedRandomAc= cessReader.open(CompressedRandomAccessReader.java:46)
=A0 =A0 =A0 =A0 at org.apache.cassandra.io.util.CompressedSegmentedFil= e.createReader(CompressedSegmentedFile.java:57)
=A0 =A0 =A0 =A0 at org.apache.cassandra.io.util.PoolingSegmentedFile.g= etSegment(PoolingSegmentedFile.java:41)
=A0 =A0 =A0 =A0 at org.apache.cassandra.io.sstable.SSTableReader.getFi= leDataInput(SSTableReader.java:976)
=A0 =A0 =A0 =A0 at org.apache.cassandra.db.columniterator.SSTableNames= Iterator.createFileDataInput(SSTableNamesIterator.java:98)
=A0 =A0 =A0 =A0 at org.apache.cassandra.db.columniterator.SSTableNames= Iterator.read(SSTableNamesIterator.java:117)
=A0 =A0 =A0 =A0 at org.apache.cassandra.db.columniterator.SSTableNames= Iterator.<init>(SSTableNamesIterator.java:64)
=A0 =A0 =A0 =A0 at org.apache.cassandra.db.filter.NamesQueryFilter.get= SSTableColumnIterator(NamesQueryFilter.java:81)
=A0 =A0 =A0 =A0 at org.apache.cassandra.db.filter.QueryFilter.getSSTab= leColumnIterator(QueryFilter.java:68)
=A0 =A0 =A0 =A0 at org.apache.cassandra.db.CollationController.collect= TimeOrderedData(CollationController.java:133)
=A0 =A0 =A0 =A0 at org.apache.cassandra.db.CollationController.getTopL= evelColumns(CollationController.java:65)
=A0 =A0 =A0 =A0 at org.apache.cassandra.db.ColumnFamilyStore.getTopLev= elColumns(ColumnFamilyStore.java:1357)
=A0 =A0 =A0 =A0 at org.apache.cassandra.db.ColumnFamilyStore.getColumn= Family(ColumnFamilyStore.java:1214)
=A0 =A0 =A0 =A0 at org.apache.cassandra.db.ColumnFamilyStore.getColumn= Family(ColumnFamilyStore.java:1126)
=A0 =A0 =A0 =A0 at org.apache.cassandra.db.Table.getRow(Table.java:347= )
=A0 =A0 =A0 =A0 at org.apache.cassandra.db.SliceByNamesReadCommand.get= Row(SliceByNamesReadCommand.java:64)
=A0 =A0 =A0 =A0 at org.apache.cassandra.db.ReadVerbHandler.doVerb(Read= VerbHandler.java:44)
=A0 =A0 =A0 =A0 at org.apache.cassandra.net.MessageDeliveryTask.run(Me= ssageDeliveryTask.java:56)
=A0 =A0 =A0 =A0 at java.util.concurrent.ThreadPoolExecutor.runWorker(T= hreadPoolExecutor.java:1145)
=A0 =A0 =A0 =A0 at java.util.concurrent.ThreadPoolExecutor$Worker.run(= ThreadPoolExecutor.java:615)
=A0 =A0 =A0 =A0 at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.FileNotFoundException: /data/1/cassandra/data/rts/4= 0301_feedProducts/rts-40301_feedProducts-ib-1-Data.db (No such file or dire= ctory)
=A0 =A0 =A0 =A0 at java.io.RandomAccessFile.open(Native Method)
=A0 =A0 =A0 =A0 at java.io.RandomAccessFile.<init>(RandomAccessF= ile.java:233)
=A0 =A0 =A0 =A0 at org.apache.cassandra.io.util.RandomAccessReader.<= ;init>(RandomAccessReader.java:67)
=A0 =A0 =A0 =A0 at org.apache.cassandra.io.compress.CompressedRandomAc= cessReader.<init>(CompressedRandomAccessReader.java:75)
=A0 =A0 =A0 =A0 at org.apache.cassandra.io.compress.CompressedRandomAc= cessReader.open(CompressedRandomAccessReader.java:42)
=A0 =A0 =A0 =A0 ... 20 more



--089e0158b3980556f404e33d502a--