Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7BC52200BC7 for ; Fri, 25 Nov 2016 07:42:03 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 7A55A160B01; Fri, 25 Nov 2016 06:42:03 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 79698160AF6 for ; Fri, 25 Nov 2016 07:42:02 +0100 (CET) Received: (qmail 72829 invoked by uid 500); 25 Nov 2016 06:42:00 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 72819 invoked by uid 99); 25 Nov 2016 06:42:00 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Nov 2016 06:42:00 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 43919C0333 for ; Fri, 25 Nov 2016 06:42:00 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.23 X-Spam-Level: ** X-Spam-Status: No, score=2.23 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, KAM_LOTSOFHASH=0.25, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=imagine-orb-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id tZo3jGBIL0TM for ; Fri, 25 Nov 2016 06:41:56 +0000 (UTC) Received: from mail-wm0-f49.google.com (mail-wm0-f49.google.com [74.125.82.49]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 4AA945F3F5 for ; Fri, 25 Nov 2016 06:41:56 +0000 (UTC) Received: by mail-wm0-f49.google.com with SMTP id a197so133768350wmd.0 for ; Thu, 24 Nov 2016 22:41:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=imagine-orb-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=pJWCACYiIAPXKXOjLBlKFkp1TltzsaeoS5zT5SYOhb8=; b=DakqVejbo8uksvj9gjLdbYaEnM/YbTOupAS4N1NrCE6R0Fj7RIdJZXKCuVDz9w++Zq 8bOkgPwmWbP9EPfQwcIxntEDsUWcjVRNqKQp3gWKfo3oFdVnyDXWkC7Ls70cOQtLVb3c +5O/rPK8s7O5jKFqa5rNO1SA0f7ZojSuDd+94bdAINS/noR68dSaSh0Y/Ehc+1BwecxI X95NQWEENiNW8F1foTPB64ZeYocomNe/8hfBgW3E1IvgyC56VR1Aba+zpT2yDjJ2p6wW zu2yqWaOkBg/WfAQ0Hw7LVcASgnz/0rjs6mcGI3ZpKtqqBMV8cvYJLt/dxPvJw69TvYt 176Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=pJWCACYiIAPXKXOjLBlKFkp1TltzsaeoS5zT5SYOhb8=; b=hYMcjyd50kEs2Ci2zal2F7rzZZ5UlXVJnO5UfAD88xO10pc03zpB8HQQDXHnpDxFF4 I+zEM+DWsb1EK0YJeq7HJr4nhrxp23y7fKVbNhJHccGaM64ZB5zd5PRm4NmEU0rTvahv YRtKwRaG35u2zeIVKgCKonGbFtVz/IZoZNf9H9JT0OHj3p8Vb23yJxpuCN+nX4c5IGJW bcvZrWz2l7Y2ky5ud13UOa1QoWHXphI+G5ChzwurdCoOKIuayfqUfd3lVRhIG18ukEbK FIBt1IkLOmG2QD4LVDsQ+6L8TAVIRK6dDuP3WtADj6dnyx/dV6njX7Hfp6eup9ME6z2V lPpg== X-Gm-Message-State: AKaTC00CvhvEDTi7tbcR06kUzQfFAALe+hxGECUJPquJ9oXY3jZl1+6PqYK80prQSIVkK32F8qMFmdx5X3gRWA== X-Received: by 10.28.1.135 with SMTP id 129mr6252779wmb.108.1480056115065; Thu, 24 Nov 2016 22:41:55 -0800 (PST) MIME-Version: 1.0 Received: by 10.28.74.214 with HTTP; Thu, 24 Nov 2016 22:41:54 -0800 (PST) In-Reply-To: References: From: Yuji Ito Date: Fri, 25 Nov 2016 15:41:54 +0900 Message-ID: Subject: Re: Does recovery continue after truncating a table? To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a113c842efc899e05421a6ab6 archived-at: Fri, 25 Nov 2016 06:42:03 -0000 --001a113c842efc899e05421a6ab6 Content-Type: text/plain; charset=UTF-8 Hi all, I revised the script to reproduce the issue. I think the issue happens more frequently than before. Killing another node is added to the previous script. ==== [script] ==== #!/bin/sh node1_ip= node2_ip= node3_ip= node2_user= node3_user= rows=10000 echo "consistency quorum;" > init_data.cql for key in $(seq 0 $(expr $rows - 1)) do echo "insert into testdb.testtbl (key, val) values($key, 1111) IF NOT EXISTS;" >> init_data.cql done while true do echo "truncate the table" cqlsh $node1_ip -e "truncate table testdb.testtbl" > /dev/null 2>&1 if [ $? -ne 0 ]; then echo "truncating failed" continue else break fi done echo "kill C* process on node3" pdsh -l $node3_user -R ssh -w $node3_ip "ps auxww | grep CassandraDaemon | awk '{if (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9" echo "insert $rows rows" cqlsh $node1_ip -f init_data.cql > insert_log 2>&1 echo "restart C* process on node3" pdsh -l $node3_user -R ssh -w $node3_ip "sudo /etc/init.d/cassandra start" while true do echo "truncate the table again" cqlsh $node1_ip -e "truncate table testdb.testtbl" if [ $? -ne 0 ]; then echo "truncating failed" continue else echo "truncation succeeded!" break fi done echo "kill C* process on node2" pdsh -l $node2_user -R ssh -w $node2_ip "ps auxww | grep CassandraDaemon | awk '{if (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9" cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select count(*) from testdb.testtbl;" sleep 10 cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select count(*) from testdb.testtbl;" echo "restart C* process on node2" pdsh -l $node2_user -R ssh -w $node2_ip "sudo /etc/init.d/cassandra start" Thanks, yuji On Fri, Nov 18, 2016 at 7:52 PM, Yuji Ito wrote: > I investigated source code and logs of killed node. > I guess that unexpected writes are executed when truncation is being > executed. > > Some writes were executed after flush (the first flush) in truncation and > these writes could be read. > These writes were requested as MUTATION by another node for hinted handoff. > Their data was stored to a new memtable and flushed (the second flush) to > a new SSTable before snapshot in truncation. > So, the truncation discarded only old SSTables, not the new SSTable. > That's because ReplayPosition which was used for discarding SSTable was > that of the first flush. > > I copied some parts of log as below. > "##" line is my comment. > The point is that the ReplayPosition is moved forward by the second flush. > It means some writes are executed after the first flush. > > == log == > ## started truncation > TRACE [SharedPool-Worker-16] 2016-11-17 08:36:04,612 > ColumnFamilyStore.java:2790 - truncating testtbl > ## the first flush started before truncation > DEBUG [SharedPool-Worker-16] 2016-11-17 08:36:04,612 > ColumnFamilyStore.java:952 - Enqueuing flush of testtbl: 591360 (0%) > on-heap, 0 (0%) off-heap > INFO [MemtableFlushWriter:1] 2016-11-17 08:36:04,613 Memtable.java:352 - > Writing Memtable-testtbl@1863835308(42.625KiB serialized bytes, 2816 ops, > 0%/0% of on/off-heap limit) > ... > DEBUG [MemtableFlushWriter:1] 2016-11-17 08:36:04,973 Memtable.java:386 - > Completed flushing /var/lib/cassandra/data/testdb > /testtbl-562848f0a55611e68b1451065d58fdfb/tmp-lb-1-big-Data.db > (17.651KiB) for commitlog position ReplayPosition(segmentId=1479371760395, > position=315867) > ## this ReplayPosition was used for discarding SSTables > ... > TRACE [MemtablePostFlush:1] 2016-11-17 08:36:05,022 CommitLog.java:298 - > discard completed log segments for ReplayPosition(segmentId=1479371760395, > position=315867), table 562848f0-a556-11e6-8b14-51065d58fdfb > ## end of the first flush > DEBUG [SharedPool-Worker-16] 2016-11-17 08:36:05,028 > ColumnFamilyStore.java:2823 - Discarding sstable data for truncated CF + > indexes > ## the second flush before snapshot > DEBUG [SharedPool-Worker-16] 2016-11-17 08:36:05,028 > ColumnFamilyStore.java:952 - Enqueuing flush of testtbl: 698880 (0%) > on-heap, 0 (0%) off-heap > INFO [MemtableFlushWriter:2] 2016-11-17 08:36:05,029 Memtable.java:352 - > Writing Memtable-testtbl@1186728207(50.375KiB serialized bytes, 3328 ops, > 0%/0% of on/off-heap limit) > ... > DEBUG [MemtableFlushWriter:2] 2016-11-17 08:36:05,258 Memtable.java:386 - > Completed flushing /var/lib/cassandra/data/testdb > /testtbl-562848f0a55611e68b1451065d58fdfb/tmp-lb-2-big-Data.db > (17.696KiB) for commitlog position ReplayPosition(segmentId=1479371760395, > position=486627) > ... > TRACE [MemtablePostFlush:1] 2016-11-17 08:36:05,289 CommitLog.java:298 - > discard completed log segments for ReplayPosition(segmentId=1479371760395, > position=486627), table 562848f0-a556-11e6-8b14-51065d58fdfb > ## end of the second flush: position was moved > ... > ## only old SSTable was deleted because this SSTable was older than > ReplayPosition(segmentId=1479371760395, position=315867) > TRACE [NonPeriodicTasks:1] 2016-11-17 08:36:05,303 SSTable.java:118 - > Deleted /var/lib/cassandra/data/testdb/testtbl-562848f0a55611e68b145 > 1065d58fdfb/lb-1-big > ... > TRACE [SharedPool-Worker-16] 2016-11-17 08:36:05,320 > ColumnFamilyStore.java:2841 - truncate complete > TRACE [SharedPool-Worker-16] 2016-11-17 08:36:05,320 > TruncateVerbHandler.java:53 - Truncation(keyspace='testdb', cf='testtbl') > applied. Enqueuing response to 36512@/10.91.145.7 > TRACE [SharedPool-Worker-16] 2016-11-17 08:36:05,320 > MessagingService.java:728 - /10.91.145.27 sending REQUEST_RESPONSE to > 36512@/10.91.145.7 > ## end of truncation > ==== > > Actually, "truncated_at" of the table on the system.local after running > the script was 0x00000158716da30b0004d1db00000158716db524. > It means segmentId=1479371760395, position=315867 > truncated_at=1479371765028 (2016-11-17 08:36:05,028) > > thanks, > yuji > > > On Wed, Nov 16, 2016 at 5:25 PM, Yuji Ito wrote: > >> Hi, >> >> I could find stale data after truncating a table. >> It seems that truncating starts while recovery is being executed just >> after a node restarts. >> After the truncating finishes, recovery still continues? >> Is it expected? >> >> I use C* 2.2.8 and can reproduce it as below. >> >> ==== [create table] ==== >> cqlsh $ip -e "drop keyspace testdb;" >> cqlsh $ip -e "CREATE KEYSPACE testdb WITH replication = {'class': >> 'SimpleStrategy', 'replication_factor': '3'};" >> cqlsh $ip -e "CREATE TABLE testdb.testtbl (key int PRIMARY KEY, val int);" >> >> ==== [script] ==== >> #!/bin/sh >> >> node1_ip= >> node2_ip= >> node3_ip= >> node3_user= >> rows=10000 >> >> echo "consistency quorum;" > init_data.cql >> for key in $(seq 0 $(expr $rows - 1)) >> do >> echo "insert into testdb.testtbl (key, val) values($key, 1111) IF NOT >> EXISTS;" >> init_data.cql >> done >> >> while true >> do >> echo "truncate the table" >> cqlsh $node1_ip -e "truncate table testdb.testtbl" >> if [ $? -ne 0 ]; then >> echo "truncating failed" >> continue >> else >> break >> fi >> done >> >> echo "kill C* process on node3" >> pdsh -l $node3_user -R ssh -w $node3_ip "ps auxww | grep CassandraDaemon >> | awk '{if (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9" >> >> echo "insert $rows rows" >> cqlsh $node1_ip -f init_data.cql > insert_log 2>&1 >> >> echo "restart C* process on node3" >> pdsh -l $node3_user -R ssh -w $node3_ip "sudo /etc/init.d/cassandra start" >> >> while true >> do >> echo "truncate the table again" >> cqlsh $node1_ip -e "truncate table testdb.testtbl" >> if [ $? -ne 0 ]; then >> echo "truncating failed" >> continue >> else >> break >> fi >> done >> >> cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select >> count(*) from testdb.testtbl;" >> sleep 10 >> cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select >> count(*) from testdb.testtbl;" >> >> >> ==== [result] ==== >> truncate the table >> kill C* process on node3 >> insert 10000 rows >> restart C* process on node3 >> 10.91.145.27: Starting Cassandra: OK >> truncate the table again >> :1:TruncateError: Error during truncate: Cannot achieve >> consistency level ALL >> truncating failed >> truncate the table again >> :1:TruncateError: Error during truncate: Cannot achieve >> consistency level ALL >> truncating failed >> truncate the table again >> :1:TruncateError: Error during truncate: Cannot achieve >> consistency level ALL >> truncating failed >> truncate the table again >> :1:TruncateError: Error during truncate: Cannot achieve >> consistency level ALL >> truncating failed >> truncate the table again >> :1:TruncateError: Error during truncate: Cannot achieve >> consistency level ALL >> truncating failed >> truncate the table again >> :1:TruncateError: Error during truncate: Cannot achieve >> consistency level ALL >> truncating failed >> truncate the table again >> Consistency level set to SERIAL. >> >> count >> ------- >> 300 >> >> (1 rows) >> >> Warnings : >> Aggregation query used without partition key >> >> Consistency level set to SERIAL. >> >> count >> ------- >> 2304 >> >> (1 rows) >> >> Warnings : >> Aggregation query used without partition key >> ==== >> >> I found it when I was investigating data lost problem. (Ref. "failure >> node rejoin" thread) >> I'm not sure this problem is related to data lost. >> >> Thanks, >> yuji >> > > --001a113c842efc899e05421a6ab6 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi all,

I revised the script to reprodu= ce the issue.
I think the issue happens more frequently=C2=A0than= before.
Killing another node is added to the previous script.

=3D=3D=3D=3D [script] =3D=3D=3D=3D
#!/bin/sh

node1_ip= =3D<node1 IP address>
node2_ip= =3D<node2 IP address>
node3_ip= =3D<node3 IP address>
node2_user=3D<user name>
node3_user=3D<user name>
<= span style=3D"font-size:12.8px">rows=3D10000

echo "consistency quorum;" > init_data.cql
= for key in $(seq 0 $(expr $rows - 1))
do
=C2=A0 =C2=A0 echo "insert into testdb.testtbl= (key, val) values($key, 1111) IF NOT EXISTS;" >> init_data.cql<= /span>
=C2=A0 =C2=A0 done=

=C2=A0 =C2=A0 while true
=C2=A0 =C2=A0 do
=C2=A0 =C2=A0 echo "truncate the table"
=C2=A0 =C2=A0 cqlsh $node1_ip -e &= quot;truncate table testdb.testtbl" > /dev/null 2>&1<= /div>
=C2=A0 =C2=A0 if [ $? -ne 0 ]; t= hen
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 echo "truncating failed"
=C2=A0 =C2=A0 continue
=C2=A0 =C2=A0 else
=C2=A0 =C2=A0 =C2=A0 =C2=A0 break
=C2=A0 =C2=A0 fi
done

echo "kill C* process on = node3"
pdsh -l $node= 3_user -R ssh -w $node3_ip "ps auxww | grep CassandraDaemon | awk '= ;{if (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9"

echo "insert $rows rows"
<= span style=3D"font-size:12.8px">cqlsh $node1_ip -f init_data.cql > inser= t_log 2>&1

echo "restart C* proce= ss on node3"
pdsh -l= $node3_user -R ssh -w $node3_ip "sudo /etc/init.d/cassandra start&quo= t;

= while true
do
ech= o "truncate the table again"
cqlsh $node1_ip -e "truncate table testdb.testtbl"<= /span>
if [ $? -ne 0 ]; then
=C2=A0 =C2=A0 echo "tru= ncating failed"
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 continue
else
=C2=A0 =C2= =A0 echo "truncation succeeded!"
=C2=A0 =C2=A0 break
fi
done

echo "kill C* process on node2"<= /div>
pdsh -l $node2_user -R ssh -w $n= ode2_ip "ps auxww | grep CassandraDaemon | awk '{if (\$13 ~ /cassa= nd/) print \$2}' | xargs sudo kill -9"

cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; sele= ct count(*) from testdb.testtbl;"
sleep 10
cq= lsh $node1_ip --request-timeout 3600 -e "consistency serial; select co= unt(*) from testdb.testtbl;"

echo "= restart C* process on node2"
pdsh -l $node2_user -R ssh -w $node2_ip "sudo /etc/init.d/cas= sandra start"


Thanks,
yuji


On Fri, Nov 18, 2016 at 7:52 PM, Yuji Ito <yuj= i@imagine-orb.com> wrote:
<= div dir=3D"ltr">
I investigated source code and logs of killed node.
I guess that unexpected writes are executed when truncation is bein= g executed.

Some writes were executed after flush = (the first flush) in truncation and these writes could be read.
T= hese writes were requested as MUTATION by another node for hinted handoff.<= /div>
Their data was stored to a new memtable and flushed (the second f= lush) to a new SSTable before snapshot in truncation.
So, the tru= ncation discarded only old SSTables, not the new SSTable.
That= 9;s because ReplayPosition which was used for discarding SSTable was that o= f the first flush.

I copied some parts of log as b= elow.
"##" line is my comment.
The point is t= hat the ReplayPosition is moved forward by the second flush.
It m= eans some writes are executed after the first flush.

=3D=3D log =3D=3D
## started truncation
TRA= CE [SharedPool-Worker-16] 2016-11-17 08:36:04,612 ColumnFamilyStore.java:27= 90 - truncating testtbl
## the first flush started before truncat= ion
DEBUG [SharedPool-Worker-16] 2016-11-17 08:36:04,612 ColumnFa= milyStore.java:952 - Enqueuing flush of testtbl: 591360 (0%) on-heap, 0 (0%= ) off-heap
INFO =C2=A0[MemtableFlushWriter:1] 2016-11-17 08:36:04= ,613 Memtable.java:352 - Writing Memtable-testtbl@1863835308(42.625KiB= serialized bytes, 2816 ops, 0%/0% of on/off-heap limit)
...
DEBUG [MemtableFlushWriter:1] 2016-11-17 08:36:04,973 Memtable.java:3= 86 - Completed flushing /var/lib/cassandra/data/testdb/testtbl-562848f= 0a55611e68b1451065d58fdfb/tmp-lb-1-big-Data.db (17.651KiB) for co= mmitlog position ReplayPosition(segmentId=3D1479371760395, position=3D= 315867)
## this ReplayPosition was used for discarding SSTables
...
TRACE [MemtablePostFlush:1] 2016-11-17 08:36:05,022 = CommitLog.java:298 - discard completed log segments for ReplayPosition(segm= entId=3D1479371760395, position=3D315867), table 562848f0-a556-11e6-8b= 14-51065d58fdfb
## end of the first flush
DEBUG [S= haredPool-Worker-16] 2016-11-17 08:36:05,028 ColumnFamilyStore.java:2823 - = Discarding sstable data for truncated CF + indexes
## the second = flush before snapshot
DEBUG [SharedPool-Worker-16] 2016-11-17= 08:36:05,028 ColumnFamilyStore.java:952 - Enqueuing flush of testtbl: 6988= 80 (0%) on-heap, 0 (0%) off-heap
INFO =C2=A0[MemtableFlushWriter:= 2] 2016-11-17 08:36:05,029 Memtable.java:352 - Writing Memtable-testtbl@118= 6728207(50.375KiB serialized bytes, 3328 ops, 0%/0% of on/off-heap lim= it)
...
DEBUG [MemtableFlushWriter:2] 2016-11-17 08:36:= 05,258 Memtable.java:386 - Completed flushing /var/lib/cassandra/data/testd= b/testtbl-562848f0a55611e68b1451065d58fdfb/tmp-lb-2-big-Data.db (17.696KiB) for commitlog position ReplayPosition(segmentId=3D1479371760395, position=3D486627)
...
TRACE [MemtablePostFl= ush:1] 2016-11-17 08:36:05,289 CommitLog.java:298 - discard completed log s= egments for ReplayPosition(segmentId=3D1479371760395, position=3D48662= 7), table 562848f0-a556-11e6-8b14-51065d58fdfb
## end of the= second flush: position was moved
...
## only old SSTab= le was deleted because this SSTable was older than ReplayPosition(segmentId= =3D1479371760395, position=3D315867)
TRACE [NonPeriodicTasks= :1] 2016-11-17 08:36:05,303 SSTable.java:118 - Deleted /var/lib/cassandra/d= ata/testdb/testtbl-562848f0a55611e68b1451065d58fdfb/lb-1-big
...
TRACE [SharedPool-Worker-16] 2016-11-17 08:36:05,320 Co= lumnFamilyStore.java:2841 - truncate complete
TRACE [SharedPool-W= orker-16] 2016-11-17 08:36:05,320 TruncateVerbHandler.java:53 - Truncation(= keyspace=3D'testdb', cf=3D'testtbl') applied.=C2=A0 Enqueui= ng response to 36512@/10.9= 1.145.7
TRACE [SharedPool-Worker-16] 2016-11-17 08:36:05,320 = MessagingService.java:728 - /10.91.145.27 sending REQUEST_RESPONSE to 36512@/10.91.145.7
## end of truncatio= n
=3D=3D=3D=3D

Actually, "tru= ncated_at" of the table on the system.local after running the script w= as 0x00000158716da30b0004d1db00000158716db524.
It means= segmentId=3D1479371760395, position=3D315867 truncated_at=3D1479371765028 = (2016-11-17 08:36:05,028)

thanks,
= yuji


On Wed, Nov 16, 2016 = at 5:25 PM, Yuji Ito <yuji@imagine-orb.com> wrote:
Hi,

I could= find stale data after truncating a table.
It seems that truncati= ng starts while recovery is being executed just after a node restarts.
<= /div>
After the truncating finishes, recovery still continues?
Is it expected?

I use C* 2.2.8 and can reproduce= it as below.

=3D=3D=3D=3D [create table] =3D=3D= =3D=3D
cqlsh $ip -e "drop k= eyspace testdb;"
= cqlsh $ip -e "CREATE KEYSPACE testdb = WITH replication =3D {'class': 'SimpleStrategy', 'repli= cation_factor': '3'};"
cqlsh $ip -e "CREATE TABLE t= estdb.testtbl (key int PRIMARY KEY, val int);"
=
=3D=3D=3D=3D [script] =3D=3D=3D=3D
#!/bin/sh

node1_ip=3D<node1 IP address>
= node2_ip=3D<node2 IP address>
node3_ip=3D<node3 IP addre= ss>
node3_user=3D<user name>
rows=3D10000

echo "consistency quorum;" > init_data.cq= l
for key in $(seq 0 $(expr $rows - 1))
do
= =C2=A0 =C2=A0 echo "insert into testdb.testtbl (key, val) values($key,= 1111) IF NOT EXISTS;" >> init_data.cql
done

while true
do
echo "truncate the t= able"
cqlsh $node1_ip -e "truncate table testdb.tes= ttbl"
if [ $? -ne 0 ]; then
=C2=A0 =C2=A0 echo &qu= ot;truncating failed"
=C2=A0 =C2=A0 continue
else<= /div>
=C2=A0 =C2=A0 break
fi
done

echo "kill C* process on node3"
pdsh -l $node3_= user -R ssh -w $node3_ip "ps auxww | grep CassandraDaemon | awk '{= if (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9"
=
echo "insert $rows rows"
cqlsh $node1_ip= -f init_data.cql > insert_log 2>&1

echo= "restart C* process on node3"
pdsh -l $node3_user -R s= sh -w $node3_ip "sudo /etc/init.d/cassandra start"

=
while true
do
echo "truncate the table = again"
cqlsh $node1_ip -e "truncate table testdb.testtb= l"
if [ $? -ne 0 ]; then
=C2=A0 =C2=A0 echo "= truncating failed"
=C2=A0 =C2=A0 continue
else
=C2=A0 =C2=A0 break
fi
done

=
cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; se= lect count(*) from testdb.testtbl;"
sleep 10
cqlsh= $node1_ip --request-timeout 3600 -e "consistency serial; select count= (*) from testdb.testtbl;"


=3D=3D=3D=3D [result] =3D=3D=3D=3D
truncate the table
kill C* process on node3
insert 10000 rows
restar= t C* process on node3
10.91.145.27: Starting Cassandra: OK
truncate the tabl= e again
<stdin>:1:TruncateError: Error during truncate: Can= not achieve consistency level ALL
truncating failed
tru= ncate the table again
<stdin>:1:TruncateError: Error during= truncate: Cannot achieve consistency level ALL
truncating failed=
truncate the table again
<stdin>:1:TruncateError= : Error during truncate: Cannot achieve consistency level ALL
tru= ncating failed
truncate the table again
<stdin>:1= :TruncateError: Error during truncate: Cannot achieve consistency level ALL=
truncating failed
truncate the table again
&= lt;stdin>:1:TruncateError: Error during truncate: Cannot achieve consist= ency level ALL
truncating failed
truncate the table aga= in
<stdin>:1:TruncateError: Error during truncate: Cannot a= chieve consistency level ALL
truncating failed
truncate= the table again
Consistency level set to SERIAL.

<= /div>
=C2=A0count
-------
=C2=A0 =C2=A0300

(1 rows)

Warnings :
Agg= regation query used without partition key

Consiste= ncy level set to SERIAL.

=C2=A0count
---= ----
=C2=A0 2304

(1 rows)

=
Warnings :
Aggregation query used without partition ke= y
=3D=3D=3D=3D

I found it when I w= as investigating data lost problem. (Ref. "failure node rejoin" t= hread)
I'm not sure this problem is related to data lost.=

Thanks,
yuji


--001a113c842efc899e05421a6ab6--