Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 69E2D200BDD for ; Sat, 26 Nov 2016 07:13:19 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 6865E160B1D; Sat, 26 Nov 2016 06:13:19 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E55F0160B1C for ; Sat, 26 Nov 2016 07:13:17 +0100 (CET) Received: (qmail 71455 invoked by uid 500); 26 Nov 2016 06:13:16 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 71445 invoked by uid 99); 26 Nov 2016 06:13:16 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 26 Nov 2016 06:13:16 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 7862EC068A for ; Sat, 26 Nov 2016 06:13:15 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.53 X-Spam-Level: * X-Spam-Status: No, score=1.53 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, KAM_LOTSOFHASH=0.25, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=instaclustr-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id f93YI5Nj_0mS for ; Sat, 26 Nov 2016 06:13:12 +0000 (UTC) Received: from mail-qk0-f173.google.com (mail-qk0-f173.google.com [209.85.220.173]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id F36D45F4A9 for ; Sat, 26 Nov 2016 06:13:11 +0000 (UTC) Received: by mail-qk0-f173.google.com with SMTP id n204so95695077qke.2 for ; Fri, 25 Nov 2016 22:13:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=instaclustr-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=mtbqKXga4blFWnOkowqcH4oQGKXbGL8rOCkUVwyWkBQ=; b=1O2aq3iORtFygNIUc/tPTVwMMhmhkPTpTUBzQnN8gzWTeJ3klZfFHva/bIDotJBP7Q mgGtRw9uZmXtfG3YhhUk8kCoyX1KmaTJsnb8qXWYGRS0qIq5J/zn9DJzbIMIGU7OKyYH j5MDWi9P7Yv0HgR95gxLCFUTLiqS18Bs43Ifn2kQsFgz7tBpRQsqleqoK3FYJ0sdMP7c GBs12nuF3vKluOmxkDKtj2XSpywCNhLHHtzvdoAYXqhBQ5Q29SC0jW6oBHblCRkg3tm0 +UfvabKd6Y1wV+zf/jzyAafwOYJlu5Wh8A4ABNJpJjevTqww53RozRF0VqzapzI3DhAV 6N+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=mtbqKXga4blFWnOkowqcH4oQGKXbGL8rOCkUVwyWkBQ=; b=aGwQ/VOx24jQGcGQTTCU9Qv117ofNSGAuRnlxlVJzZvIQoAY8f6w7DEhvfxCthr3DP j//z8npg+JY4znYhWD8i5jAnkkqKjyn9bNENm2hcRU7ujV1lqnpQLzlnkRxk0KfqkNL7 nC/nHXZqigdosLEfM3bLuymbpKfMwimomKPG5h58yHIj0uoDClDNXC7wpM40zMI6rzId sziOwMBpcwrkf4xQ0kZIU2XIcO9nDRhOijbQEwvLkVml1w/tz1duJ35yMiUCeTaO3zyJ AqGkax3OMOQtJDqjOZQ0Ip4ETUJjHB5wTg0Dt+fkX98GBfIAmzd3PYM2vBGPKmDWUUOC q7UQ== X-Gm-Message-State: AKaTC02nxZKXvuaem2IrFoQgcVwQR/03EWwKe3YPVKvCsD4Eh0CibleP8wYma0F/EokCtqFS2mmm9mSCYqoc8o/C X-Received: by 10.55.74.1 with SMTP id x1mr10847198qka.316.1480140791292; Fri, 25 Nov 2016 22:13:11 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Ben Slater Date: Sat, 26 Nov 2016 06:13:00 +0000 Message-ID: Subject: Re: Does recovery continue after truncating a table? To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a114a7db81539b205422e228a archived-at: Sat, 26 Nov 2016 06:13:19 -0000 --001a114a7db81539b205422e228a Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Nice detective work! Seems to me that it=E2=80=99s a best an undocumented limitation and potentially could be viewed as a bug - maybe log another JIRA? One node - there is a nodetool truncatehints command that could be used to clear out the hints ( http://cassandra.apache.org/doc/latest/tools/nodetool/truncatehints.html?hi= ghlight=3Dtruncate) . However, it seems to clear all hints on particular endpoint, not just for a specific table. Cheers Ben On Fri, 25 Nov 2016 at 17:42 Yuji Ito wrote: > Hi all, > > I revised the script to reproduce the issue. > I think the issue happens more frequently than before. > Killing another node is added to the previous script. > > =3D=3D=3D=3D [script] =3D=3D=3D=3D > #!/bin/sh > > node1_ip=3D > node2_ip=3D > node3_ip=3D > node2_user=3D > node3_user=3D > rows=3D10000 > > echo "consistency quorum;" > init_data.cql > for key in $(seq 0 $(expr $rows - 1)) > do > echo "insert into testdb.testtbl (key, val) values($key, 1111) IF NOT > EXISTS;" >> init_data.cql > done > > while true > do > echo "truncate the table" > cqlsh $node1_ip -e "truncate table testdb.testtbl" > /dev/null 2>&1 > if [ $? -ne 0 ]; then > echo "truncating failed" > continue > else > break > fi > done > > echo "kill C* process on node3" > pdsh -l $node3_user -R ssh -w $node3_ip "ps auxww | grep CassandraDaemon = | > awk '{if (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9" > > echo "insert $rows rows" > cqlsh $node1_ip -f init_data.cql > insert_log 2>&1 > > echo "restart C* process on node3" > pdsh -l $node3_user -R ssh -w $node3_ip "sudo /etc/init.d/cassandra start= " > > while true > do > echo "truncate the table again" > cqlsh $node1_ip -e "truncate table testdb.testtbl" > if [ $? -ne 0 ]; then > echo "truncating failed" > continue > else > echo "truncation succeeded!" > break > fi > done > > echo "kill C* process on node2" > pdsh -l $node2_user -R ssh -w $node2_ip "ps auxww | grep CassandraDaemon = | > awk '{if (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9" > > cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select > count(*) from testdb.testtbl;" > sleep 10 > cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select > count(*) from testdb.testtbl;" > > echo "restart C* process on node2" > pdsh -l $node2_user -R ssh -w $node2_ip "sudo /etc/init.d/cassandra start= " > > > Thanks, > yuji > > > On Fri, Nov 18, 2016 at 7:52 PM, Yuji Ito wrote: > > I investigated source code and logs of killed node. > I guess that unexpected writes are executed when truncation is being > executed. > > Some writes were executed after flush (the first flush) in truncation and > these writes could be read. > These writes were requested as MUTATION by another node for hinted handof= f. > Their data was stored to a new memtable and flushed (the second flush) to > a new SSTable before snapshot in truncation. > So, the truncation discarded only old SSTables, not the new SSTable. > That's because ReplayPosition which was used for discarding SSTable was > that of the first flush. > > I copied some parts of log as below. > "##" line is my comment. > The point is that the ReplayPosition is moved forward by the second flush= . > It means some writes are executed after the first flush. > > =3D=3D log =3D=3D > ## started truncation > TRACE [SharedPool-Worker-16] 2016-11-17 08:36:04,612 > ColumnFamilyStore.java:2790 - truncating testtbl > ## the first flush started before truncation > DEBUG [SharedPool-Worker-16] 2016-11-17 08:36:04,612 > ColumnFamilyStore.java:952 - Enqueuing flush of testtbl: 591360 (0%) > on-heap, 0 (0%) off-heap > INFO [MemtableFlushWriter:1] 2016-11-17 08:36:04,613 Memtable.java:352 - > Writing Memtable-testtbl@1863835308(42.625KiB serialized bytes, 2816 ops, > 0%/0% of on/off-heap limit) > ... > DEBUG [MemtableFlushWriter:1] 2016-11-17 08:36:04,973 Memtable.java:386 - > Completed flushing > /var/lib/cassandra/data/testdb/testtbl-562848f0a55611e68b1451065d58fdfb/t= mp-lb-1-big-Data.db > (17.651KiB) for commitlog position ReplayPosition(segmentId=3D14793717603= 95, > position=3D315867) > ## this ReplayPosition was used for discarding SSTables > ... > TRACE [MemtablePostFlush:1] 2016-11-17 08:36:05,022 CommitLog.java:298 - > discard completed log segments for ReplayPosition(segmentId=3D14793717603= 95, > position=3D315867), table 562848f0-a556-11e6-8b14-51065d58fdfb > ## end of the first flush > DEBUG [SharedPool-Worker-16] 2016-11-17 08:36:05,028 > ColumnFamilyStore.java:2823 - Discarding sstable data for truncated CF + > indexes > ## the second flush before snapshot > DEBUG [SharedPool-Worker-16] 2016-11-17 08:36:05,028 > ColumnFamilyStore.java:952 - Enqueuing flush of testtbl: 698880 (0%) > on-heap, 0 (0%) off-heap > INFO [MemtableFlushWriter:2] 2016-11-17 08:36:05,029 Memtable.java:352 - > Writing Memtable-testtbl@1186728207(50.375KiB serialized bytes, 3328 ops, > 0%/0% of on/off-heap limit) > ... > DEBUG [MemtableFlushWriter:2] 2016-11-17 08:36:05,258 Memtable.java:386 - > Completed flushing > /var/lib/cassandra/data/testdb/testtbl-562848f0a55611e68b1451065d58fdfb/t= mp-lb-2-big-Data.db > (17.696KiB) for commitlog position ReplayPosition(segmentId=3D14793717603= 95, > position=3D486627) > ... > TRACE [MemtablePostFlush:1] 2016-11-17 08:36:05,289 CommitLog.java:298 - > discard completed log segments for ReplayPosition(segmentId=3D14793717603= 95, > position=3D486627), table 562848f0-a556-11e6-8b14-51065d58fdfb > ## end of the second flush: position was moved > ... > ## only old SSTable was deleted because this SSTable was older than > ReplayPosition(segmentId=3D1479371760395, position=3D315867) > TRACE [NonPeriodicTasks:1] 2016-11-17 08:36:05,303 SSTable.java:118 - > Deleted > /var/lib/cassandra/data/testdb/testtbl-562848f0a55611e68b1451065d58fdfb/l= b-1-big > ... > TRACE [SharedPool-Worker-16] 2016-11-17 08:36:05,320 > ColumnFamilyStore.java:2841 - truncate complete > TRACE [SharedPool-Worker-16] 2016-11-17 08:36:05,320 > TruncateVerbHandler.java:53 - Truncation(keyspace=3D'testdb', cf=3D'testt= bl') > applied. Enqueuing response to 36512@/10.91.145.7 > TRACE [SharedPool-Worker-16] 2016-11-17 08:36:05,320 > MessagingService.java:728 - /10.91.145.27 sending REQUEST_RESPONSE to > 36512@/10.91.145.7 > ## end of truncation > =3D=3D=3D=3D > > Actually, "truncated_at" of the table on the system.local after running > the script was 0x00000158716da30b0004d1db00000158716db524. > It means segmentId=3D1479371760395, position=3D315867 > truncated_at=3D1479371765028 (2016-11-17 08:36:05,028) > > thanks, > yuji > > > On Wed, Nov 16, 2016 at 5:25 PM, Yuji Ito wrote: > > Hi, > > I could find stale data after truncating a table. > It seems that truncating starts while recovery is being executed just > after a node restarts. > After the truncating finishes, recovery still continues? > Is it expected? > > I use C* 2.2.8 and can reproduce it as below. > > =3D=3D=3D=3D [create table] =3D=3D=3D=3D > cqlsh $ip -e "drop keyspace testdb;" > cqlsh $ip -e "CREATE KEYSPACE testdb WITH replication =3D {'class': > 'SimpleStrategy', 'replication_factor': '3'};" > cqlsh $ip -e "CREATE TABLE testdb.testtbl (key int PRIMARY KEY, val int);= " > > =3D=3D=3D=3D [script] =3D=3D=3D=3D > #!/bin/sh > > node1_ip=3D > node2_ip=3D > node3_ip=3D > node3_user=3D > rows=3D10000 > > echo "consistency quorum;" > init_data.cql > for key in $(seq 0 $(expr $rows - 1)) > do > echo "insert into testdb.testtbl (key, val) values($key, 1111) IF NOT > EXISTS;" >> init_data.cql > done > > while true > do > echo "truncate the table" > cqlsh $node1_ip -e "truncate table testdb.testtbl" > if [ $? -ne 0 ]; then > echo "truncating failed" > continue > else > break > fi > done > > echo "kill C* process on node3" > pdsh -l $node3_user -R ssh -w $node3_ip "ps auxww | grep CassandraDaemon = | > awk '{if (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9" > > echo "insert $rows rows" > cqlsh $node1_ip -f init_data.cql > insert_log 2>&1 > > echo "restart C* process on node3" > pdsh -l $node3_user -R ssh -w $node3_ip "sudo /etc/init.d/cassandra start= " > > while true > do > echo "truncate the table again" > cqlsh $node1_ip -e "truncate table testdb.testtbl" > if [ $? -ne 0 ]; then > echo "truncating failed" > continue > else > break > fi > done > > cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select > count(*) from testdb.testtbl;" > sleep 10 > cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select > count(*) from testdb.testtbl;" > > > =3D=3D=3D=3D [result] =3D=3D=3D=3D > truncate the table > kill C* process on node3 > insert 10000 rows > restart C* process on node3 > 10.91.145.27: Starting Cassandra: OK > truncate the table again > :1:TruncateError: Error during truncate: Cannot achieve consistenc= y > level ALL > truncating failed > truncate the table again > :1:TruncateError: Error during truncate: Cannot achieve consistenc= y > level ALL > truncating failed > truncate the table again > :1:TruncateError: Error during truncate: Cannot achieve consistenc= y > level ALL > truncating failed > truncate the table again > :1:TruncateError: Error during truncate: Cannot achieve consistenc= y > level ALL > truncating failed > truncate the table again > :1:TruncateError: Error during truncate: Cannot achieve consistenc= y > level ALL > truncating failed > truncate the table again > :1:TruncateError: Error during truncate: Cannot achieve consistenc= y > level ALL > truncating failed > truncate the table again > Consistency level set to SERIAL. > > count > ------- > 300 > > (1 rows) > > Warnings : > Aggregation query used without partition key > > Consistency level set to SERIAL. > > count > ------- > 2304 > > (1 rows) > > Warnings : > Aggregation query used without partition key > =3D=3D=3D=3D > > I found it when I was investigating data lost problem. (Ref. "failure nod= e > rejoin" thread) > I'm not sure this problem is related to data lost. > > Thanks, > yuji > > > > --001a114a7db81539b205422e228a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Nice detective work! Seems to me that it=E2=80=99s a best = an undocumented limitation and potentially could be viewed as a bug - maybe= log another JIRA?

One node - there is a nodetool trunca= tehints command that could be used to clear out the hints (http://cassandra.apache.org/doc/latest/tools/nodetool/trunca= tehints.html?highlight=3Dtruncate)=C2=A0. However, it seems to clear al= l hints on particular endpoint, not just for a specific table.
Cheers
Ben

On Fri, 25 Nov 2016 at 17:42 Yuji Ito <yuji@imagine-orb.com> wrote:
Hi all,

= I revised the script to reproduce the issue.
= I think the issue happens more frequently=C2=A0than before.
Killing another node is added to the previous script.
<= div style=3D"font-size:12.8px" class=3D"gmail_msg">
=
=3D=3D=3D=3D [script] =3D=3D=3D=3D
<= /div>
#!/bin/sh

=
node1_ip=3D<node1 IP= address>
node2= _ip=3D<node2 IP address>
node3_ip=3D<node3 IP address>
<= div dir=3D"ltr" class=3D"gmail_msg">
node2_user=3D<user name>
node3_user=3D<user name>
rows=3D10000

echo "consistency qu= orum;" > init_data.cql
for key in $(seq 0 $(expr $ro= ws - 1))
do
=C2=A0 =C2=A0 echo "insert i= nto testdb.testtbl (key, val) values($key, 1111) IF NOT EXISTS;" >&= gt; init_data.cql
=C2=A0 =C2=A0 done

=C2=A0 =C2=A0 while true
= =C2=A0 =C2=A0 do
=C2=A0 =C2=A0 echo "truncate the table= "
=C2=A0 =C2=A0 cqlsh $node1_ip -e "truncate table = testdb.testtbl" > /dev/null 2>&1
=C2=A0 =C2=A0= if [ $? -ne 0 ]; then
=C2=A0 =C2=A0 =C2=A0 =C2=A0 echo &quo= t;truncating failed"
=C2=A0 =C2=A0 continue
=C2=A0 =C2=A0 else
=C2=A0 =C2=A0 =C2=A0 =C2=A0 break
=C2=A0 =C2=A0 fi
done

echo "kill C* process on node3"<= /span>
pdsh -l $node3_user -R ssh -w $node3_ip "ps auxww | gre= p CassandraDaemon | awk '{if (\$13 ~ /cassand/) print \$2}' | xargs= sudo kill -9"

= echo "insert $rows rows"
cqlsh $node1_ip -f init_d= ata.cql > insert_log 2>&1

=
echo "restart C* process on node3"
pdsh -l $node3_user -R ssh -w $node3_ip "sudo /etc/init.d/cassandra s= tart"

while t= rue
do
echo "truncate the table again&qu= ot;
cqlsh $node1_ip -e "truncate table testdb.testtbl&q= uot;
if [ $? -ne 0 ]; then
=C2=A0 =C2=A0 echo= "truncating failed"
=C2=A0 =C2=A0 =C2=A0 =C2=A0 c= ontinue
else
=C2=A0 =C2=A0 echo "truncat= ion succeeded!"
=C2=A0 =C2=A0 break
fi
done

= echo "kill C* process on node2"
pdsh -l $node2_use= r -R ssh -w $node2_ip "ps auxww | grep CassandraDaemon | awk '{if = (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9"
<= /div>

cqlsh $node1_ip --request-timeout= 3600 -e "consistency serial; select count(*) from testdb.testtbl;&quo= t;
sleep 10
cqlsh $node1_ip --request-timeout= 3600 -e "consistency serial; select count(*) from testdb.testtbl;&quo= t;

echo "restart= C* process on node2"
pdsh -l $node2_user -R ssh -w $no= de2_ip "sudo /etc/init.d/cassandra start"


Thanks,
yuji


On Fri, Nov 18, 2016 = at 7:52 PM, Yuji Ito <yuji@imag= ine-orb.com> wrote:
I investigated source code and logs of killed node.
I guess that unexpected writes are executed when truncati= on is being executed.

Some writes were executed after flush (the = first flush) in truncation and these writes could be read.
These writes were requested as MUTATION by another node for = hinted handoff.
Their data was stored to a ne= w memtable and flushed (the second flush) to a new SSTable before snapshot = in truncation.
So, the truncation discarded o= nly old SSTables, not the new SSTable.
That&#= 39;s because ReplayPosition which was used for discarding SSTable was that = of the first flush.

<= /div>
I copied some parts of log as below.
"##" line is my comment.
The point is that the ReplayPosition is moved forward by the se= cond flush.
It means some writes are executed= after the first flush.

=3D=3D log =3D=3D=
## started truncation
TRACE [SharedPool-Worker-16] 2016-11-17 08= :36:04,612 ColumnFamilyStore.java:2790 - truncating testtbl
## the first flush started before truncation
DEBUG [SharedPool-Worker-16] 2016-11-17 08:36:04,612 ColumnF= amilyStore.java:952 - Enqueuing flush of testtbl: 591360 (0%) on-heap, 0 (0= %) off-heap
INFO =C2=A0[MemtableFlushWriter:1= ] 2016-11-17 08:36:04,613 Memtable.java:352 - Writing Memtable-testtbl@1863= 835308(42.625KiB serialized bytes, 2816 ops, 0%/0% of on/off-heap limit)
...
DEBUG [Memtab= leFlushWriter:1] 2016-11-17 08:36:04,973 Memtable.java:386 - Completed flus= hing /var/lib/cassandra/data/testdb/testtbl-562848f0a55611e68b1451065d58fdf= b/tmp-lb-1-big-Data.db (17.651KiB) for commitlog position ReplayPosition(se= gmentId=3D1479371760395, position=3D315867)
#= # this ReplayPosition was used for discarding SSTables
...
TRACE [MemtablePostFlush:1] 201= 6-11-17 08:36:05,022 CommitLog.java:298 - discard completed log segments fo= r ReplayPosition(segmentId=3D1479371760395, position=3D315867), table 56284= 8f0-a556-11e6-8b14-51065d58fdfb
## end of the= first flush
DEBUG [SharedPool-Worker-16] 201= 6-11-17 08:36:05,028 ColumnFamilyStore.java:2823 - Discarding sstable data = for truncated CF + indexes
## the second flus= h before snapshot
DEB= UG [SharedPool-Worker-16] 2016-11-17 08:36:05,028 ColumnFamilyStore.java:95= 2 - Enqueuing flush of testtbl: 698880 (0%) on-heap, 0 (0%) off-heap
<= div class=3D"gmail_msg">INFO =C2=A0[MemtableFlushWriter:2] 2016-11-17 08:36= :05,029 Memtable.java:352 - Writing Memtable-testtbl@1186728207(50.375KiB s= erialized bytes, 3328 ops, 0%/0% of on/off-heap limit)
...
DEBUG [MemtableFlushWriter:2] 2= 016-11-17 08:36:05,258 Memtable.java:386 - Completed flushing /var/lib/cass= andra/data/testdb/testtbl-562848f0a55611e68b1451065d58fdfb/tmp-lb-2-big-Dat= a.db (17.696KiB) for commitlog position ReplayPosition(segmentId=3D14793717= 60395, position=3D486627)
...
TRACE [MemtablePostFlush:1] 2016-11-17 08:36:05,289 CommitLo= g.java:298 - discard completed log segments for ReplayPosition(segmentId=3D= 1479371760395, position=3D486627), table 562848f0-a556-11e6-8b14-51065d58fd= fb
## end of the second flush: position was m= oved
...
## onl= y old SSTable was deleted because this SSTable was older than ReplayPositio= n(segmentId=3D1479371760395, position=3D315867)
TRACE [NonPeriodicTasks:1] 2016-11-17 08:36:05,303 SSTable.java:118 - De= leted /var/lib/cassandra/data/testdb/testtbl-562848f0a55611e68b1451065d58fd= fb/lb-1-big
...
TRACE [SharedPool-Worker-16] 2016-11-17 08:36:05,320 ColumnFamilyStore.jav= a:2841 - truncate complete
TRACE [SharedPool-= Worker-16] 2016-11-17 08:36:05,320 TruncateVerbHandler.java:53 - Truncation= (keyspace=3D'testdb', cf=3D'testtbl') applied.=C2=A0 Enqueu= ing response to 36512@/10.91.145.7
TRACE [Share= dPool-Worker-16] 2016-11-17 08:36:05,320 MessagingService.java:728 - /10.91.145.= 27 sending REQUEST_RESPONSE to 36512@/10.91.145.7
## end of truncation
=3D=3D=3D= =3D

Actually, "truncated_at" of the table on the syste= m.local after running the script was 0x00000158716da30b0004d1db00000158716d= b524.
It means segme= ntId=3D1479371760395, position=3D315867 truncated_at=3D1479371765028 (2016-= 11-17 08:36:05,028)

thanks,
y= uji


On Wed, Nov 16, 2016 at 5:25= PM, Yuji Ito <yuji@imagine-orb= .com> wrote:
Hi,

I could find s= tale data after truncating a table.
It seems = that truncating starts while recovery is being executed just after a node r= estarts.
After the tr= uncating finishes, recovery still continues?
= Is it expected?

I use C* 2.2.8 and can reproduce it as below.

=3D=3D=3D=3D [create table] =3D=3D=3D=3D
cqlsh $ip -e "= drop keyspace testdb;"
cqlsh $ip -e "CREATE KEYS= PACE testdb WITH replication =3D {'class': 'SimpleStrategy'= , 'replication_factor': '3'};"
cqlsh $ip -e "CREATE TABLE testdb.testtbl (key int= PRIMARY KEY, val int);"
=3D=3D=3D=3D [script] = =3D=3D=3D=3D
#!/bin/sh

node1_ip=3D<node1 IP address>
node2_ip=3D<node2 IP address>
node3_ip=3D<node3 IP address>
node3_user=3D<user name>
rows= =3D10000

echo "consistency quorum;" > init_data.cql<= /div>
for key in $(seq 0 $(expr $rows - 1))
do
=C2=A0 =C2=A0 echo = "insert into testdb.testtbl (key, val) values($key, 1111) IF NOT EXIST= S;" >> init_data.cql
done

while true
do
echo "truncate the table"
cqlsh $node1_ip -e "truncate table testdb.testtbl"= ;
if [ $? -ne 0 ]; then
=C2=A0 =C2=A0 echo "truncating failed"
=C2=A0 =C2=A0 continue
else
=C2=A0 =C2=A0 break
fi
done

echo "kill C* proce= ss on node3"
pdsh -l $node3_user -R ssh = -w $node3_ip "ps auxww | grep CassandraDaemon | awk '{if (\$13 ~ /= cassand/) print \$2}' | xargs sudo kill -9"

echo "i= nsert $rows rows"
cqlsh $node1_ip -f ini= t_data.cql > insert_log 2>&1

echo "restart C* proc= ess on node3"
pdsh -l $node3_user -R ssh= -w $node3_ip "sudo /etc/init.d/cassandra start"

while= true
do
echo &= quot;truncate the table again"
cqlsh $no= de1_ip -e "truncate table testdb.testtbl"
if [ $? -ne 0 ]; then
=C2=A0 =C2=A0 ec= ho "truncating failed"
=C2=A0 =C2= =A0 continue
else
=C2=A0 =C2=A0 break
fi
done

cqlsh $node1_ip --request-timeout 3600 -e "c= onsistency serial; select count(*) from testdb.testtbl;"
sleep 10
cqlsh $node1_ip --r= equest-timeout 3600 -e "consistency serial; select count(*) from testd= b.testtbl;"


=3D=3D=3D=3D [result] =3D=3D=3D=3D
truncate the table
kill C* process on node3
insert 10000 row= s
restart C* process on node3
10.91.145.27: Starting Cassandra: OK
truncate the table again
<stdin>= ;:1:TruncateError: Error during truncate: Cannot achieve consistency level = ALL
truncating failed
truncate the table again
<stdin>= :1:TruncateError: Error during truncate: Cannot achieve consistency level A= LL
truncating failed
truncate the table again
<stdin>:= 1:TruncateError: Error during truncate: Cannot achieve consistency level AL= L
truncating failed
truncate the table again
<stdin>:1= :TruncateError: Error during truncate: Cannot achieve consistency level ALL=
truncating failed
truncate the table again
<stdin>:1:= TruncateError: Error during truncate: Cannot achieve consistency level ALL<= /div>
truncating failed
truncate the table again
<stdin>:1:T= runcateError: Error during truncate: Cannot achieve consistency level ALL
truncating failed
truncate the table again
Consistency level = set to SERIAL.

=
=C2=A0count
-------<= /div>
=C2=A0 =C2=A0300

(1 rows)

Warnings :
Aggregation query used without pa= rtition key

Consistency level set to SERIAL.

=C2=A0coun= t
-------
=C2= =A0 2304

(1 rows)

Warnings :
Aggregation query used without partition key
=3D=3D=3D=3D

I found it when I was investigating data l= ost problem. (Ref. "failure node rejoin" thread)
I'm not sure this problem is rela= ted to data lost.
Thanks,
yuji


--001a114a7db81539b205422e228a--