Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
MIME-Version: 1.0
References: <CALE39-eJeMW0PYnf-HufvvEG6ZfErW=7psgzg_TW9r_vMpLZsw@mail.gmail.com>
 <CALE39-caFARi9JRJpgD8-rpeO2kKjDN3Bv604zJiTcLSAe9H9w@mail.gmail.com> <CALE39-d95B3oSZAJ8TXJD-ZgT425261DvcgX8V07TMqN5sq_GA@mail.gmail.com>
In-Reply-To: <CALE39-d95B3oSZAJ8TXJD-ZgT425261DvcgX8V07TMqN5sq_GA@mail.gmail.com>
From: Ben Slater <ben.slater@instaclustr.com>
Date: Sat, 26 Nov 2016 06:13:00 +0000
Message-ID: <CAKgYGaow6UGW4+76ptb0x7PpN6j5M7e3A=_tutDG_tRZ=Tz+Yg@mail.gmail.com>
Subject: Re: Does recovery continue after truncating a table?
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001a114a7db81539b205422e228a
archived-at: Sat, 26 Nov 2016 06:13:19 -0000

--001a114a7db81539b205422e228a
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Nice detective work! Seems to me that it=E2=80=99s a best an undocumented
limitation and potentially could be viewed as a bug - maybe log another
JIRA?

One node - there is a nodetool truncatehints command that could be used to
clear out the hints (
http://cassandra.apache.org/doc/latest/tools/nodetool/truncatehints.html?hi=
ghlight=3Dtruncate)
.
However, it seems to clear all hints on particular endpoint, not just for a
specific table.

Cheers
Ben

On Fri, 25 Nov 2016 at 17:42 Yuji Ito <yuji@imagine-orb.com> wrote:

> Hi all,
>
> I revised the script to reproduce the issue.
> I think the issue happens more frequently than before.
> Killing another node is added to the previous script.
>
> =3D=3D=3D=3D [script] =3D=3D=3D=3D
> #!/bin/sh
>
> node1_ip=3D<node1 IP address>
> node2_ip=3D<node2 IP address>
> node3_ip=3D<node3 IP address>
> node2_user=3D<user name>
> node3_user=3D<user name>
> rows=3D10000
>
> echo "consistency quorum;" > init_data.cql
> for key in $(seq 0 $(expr $rows - 1))
> do
>     echo "insert into testdb.testtbl (key, val) values($key, 1111) IF NOT
> EXISTS;" >> init_data.cql
>     done
>
>     while true
>     do
>     echo "truncate the table"
>     cqlsh $node1_ip -e "truncate table testdb.testtbl" > /dev/null 2>&1
>     if [ $? -ne 0 ]; then
>         echo "truncating failed"
>     continue
>     else
>         break
>     fi
> done
>
> echo "kill C* process on node3"
> pdsh -l $node3_user -R ssh -w $node3_ip "ps auxww | grep CassandraDaemon =
|
> awk '{if (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9"
>
> echo "insert $rows rows"
> cqlsh $node1_ip -f init_data.cql > insert_log 2>&1
>
> echo "restart C* process on node3"
> pdsh -l $node3_user -R ssh -w $node3_ip "sudo /etc/init.d/cassandra start=
"
>
> while true
> do
> echo "truncate the table again"
> cqlsh $node1_ip -e "truncate table testdb.testtbl"
> if [ $? -ne 0 ]; then
>     echo "truncating failed"
>         continue
> else
>     echo "truncation succeeded!"
>     break
> fi
> done
>
> echo "kill C* process on node2"
> pdsh -l $node2_user -R ssh -w $node2_ip "ps auxww | grep CassandraDaemon =
|
> awk '{if (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9"
>
> cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select
> count(*) from testdb.testtbl;"
> sleep 10
> cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select
> count(*) from testdb.testtbl;"
>
> echo "restart C* process on node2"
> pdsh -l $node2_user -R ssh -w $node2_ip "sudo /etc/init.d/cassandra start=
"
>
>
> Thanks,
> yuji
>
>
> On Fri, Nov 18, 2016 at 7:52 PM, Yuji Ito <yuji@imagine-orb.com> wrote:
>
> I investigated source code and logs of killed node.
> I guess that unexpected writes are executed when truncation is being
> executed.
>
> Some writes were executed after flush (the first flush) in truncation and
> these writes could be read.
> These writes were requested as MUTATION by another node for hinted handof=
f.
> Their data was stored to a new memtable and flushed (the second flush) to
> a new SSTable before snapshot in truncation.
> So, the truncation discarded only old SSTables, not the new SSTable.
> That's because ReplayPosition which was used for discarding SSTable was
> that of the first flush.
>
> I copied some parts of log as below.
> "##" line is my comment.
> The point is that the ReplayPosition is moved forward by the second flush=
.
> It means some writes are executed after the first flush.
>
> =3D=3D log =3D=3D
> ## started truncation
> TRACE [SharedPool-Worker-16] 2016-11-17 08:36:04,612
> ColumnFamilyStore.java:2790 - truncating testtbl
> ## the first flush started before truncation
> DEBUG [SharedPool-Worker-16] 2016-11-17 08:36:04,612
> ColumnFamilyStore.java:952 - Enqueuing flush of testtbl: 591360 (0%)
> on-heap, 0 (0%) off-heap
> INFO  [MemtableFlushWriter:1] 2016-11-17 08:36:04,613 Memtable.java:352 -
> Writing Memtable-testtbl@1863835308(42.625KiB serialized bytes, 2816 ops,
> 0%/0% of on/off-heap limit)
> ...
> DEBUG [MemtableFlushWriter:1] 2016-11-17 08:36:04,973 Memtable.java:386 -
> Completed flushing
> /var/lib/cassandra/data/testdb/testtbl-562848f0a55611e68b1451065d58fdfb/t=
mp-lb-1-big-Data.db
> (17.651KiB) for commitlog position ReplayPosition(segmentId=3D14793717603=
95,
> position=3D315867)
> ## this ReplayPosition was used for discarding SSTables
> ...
> TRACE [MemtablePostFlush:1] 2016-11-17 08:36:05,022 CommitLog.java:298 -
> discard completed log segments for ReplayPosition(segmentId=3D14793717603=
95,
> position=3D315867), table 562848f0-a556-11e6-8b14-51065d58fdfb
> ## end of the first flush
> DEBUG [SharedPool-Worker-16] 2016-11-17 08:36:05,028
> ColumnFamilyStore.java:2823 - Discarding sstable data for truncated CF +
> indexes
> ## the second flush before snapshot
> DEBUG [SharedPool-Worker-16] 2016-11-17 08:36:05,028
> ColumnFamilyStore.java:952 - Enqueuing flush of testtbl: 698880 (0%)
> on-heap, 0 (0%) off-heap
> INFO  [MemtableFlushWriter:2] 2016-11-17 08:36:05,029 Memtable.java:352 -
> Writing Memtable-testtbl@1186728207(50.375KiB serialized bytes, 3328 ops,
> 0%/0% of on/off-heap limit)
> ...
> DEBUG [MemtableFlushWriter:2] 2016-11-17 08:36:05,258 Memtable.java:386 -
> Completed flushing
> /var/lib/cassandra/data/testdb/testtbl-562848f0a55611e68b1451065d58fdfb/t=
mp-lb-2-big-Data.db
> (17.696KiB) for commitlog position ReplayPosition(segmentId=3D14793717603=
95,
> position=3D486627)
> ...
> TRACE [MemtablePostFlush:1] 2016-11-17 08:36:05,289 CommitLog.java:298 -
> discard completed log segments for ReplayPosition(segmentId=3D14793717603=
95,
> position=3D486627), table 562848f0-a556-11e6-8b14-51065d58fdfb
> ## end of the second flush: position was moved
> ...
> ## only old SSTable was deleted because this SSTable was older than
> ReplayPosition(segmentId=3D1479371760395, position=3D315867)
> TRACE [NonPeriodicTasks:1] 2016-11-17 08:36:05,303 SSTable.java:118 -
> Deleted
> /var/lib/cassandra/data/testdb/testtbl-562848f0a55611e68b1451065d58fdfb/l=
b-1-big
> ...
> TRACE [SharedPool-Worker-16] 2016-11-17 08:36:05,320
> ColumnFamilyStore.java:2841 - truncate complete
> TRACE [SharedPool-Worker-16] 2016-11-17 08:36:05,320
> TruncateVerbHandler.java:53 - Truncation(keyspace=3D'testdb', cf=3D'testt=
bl')
> applied.  Enqueuing response to 36512@/10.91.145.7
> TRACE [SharedPool-Worker-16] 2016-11-17 08:36:05,320
> MessagingService.java:728 - /10.91.145.27 sending REQUEST_RESPONSE to
> 36512@/10.91.145.7
> ## end of truncation
> =3D=3D=3D=3D
>
> Actually, "truncated_at" of the table on the system.local after running
> the script was 0x00000158716da30b0004d1db00000158716db524.
> It means segmentId=3D1479371760395, position=3D315867
> truncated_at=3D1479371765028 (2016-11-17 08:36:05,028)
>
> thanks,
> yuji
>
>
> On Wed, Nov 16, 2016 at 5:25 PM, Yuji Ito <yuji@imagine-orb.com> wrote:
>
> Hi,
>
> I could find stale data after truncating a table.
> It seems that truncating starts while recovery is being executed just
> after a node restarts.
> After the truncating finishes, recovery still continues?
> Is it expected?
>
> I use C* 2.2.8 and can reproduce it as below.
>
> =3D=3D=3D=3D [create table] =3D=3D=3D=3D
> cqlsh $ip -e "drop keyspace testdb;"
> cqlsh $ip -e "CREATE KEYSPACE testdb WITH replication =3D {'class':
> 'SimpleStrategy', 'replication_factor': '3'};"
> cqlsh $ip -e "CREATE TABLE testdb.testtbl (key int PRIMARY KEY, val int);=
"
>
> =3D=3D=3D=3D [script] =3D=3D=3D=3D
> #!/bin/sh
>
> node1_ip=3D<node1 IP address>
> node2_ip=3D<node2 IP address>
> node3_ip=3D<node3 IP address>
> node3_user=3D<user name>
> rows=3D10000
>
> echo "consistency quorum;" > init_data.cql
> for key in $(seq 0 $(expr $rows - 1))
> do
>     echo "insert into testdb.testtbl (key, val) values($key, 1111) IF NOT
> EXISTS;" >> init_data.cql
> done
>
> while true
> do
> echo "truncate the table"
> cqlsh $node1_ip -e "truncate table testdb.testtbl"
> if [ $? -ne 0 ]; then
>     echo "truncating failed"
>     continue
> else
>     break
> fi
> done
>
> echo "kill C* process on node3"
> pdsh -l $node3_user -R ssh -w $node3_ip "ps auxww | grep CassandraDaemon =
|
> awk '{if (\$13 ~ /cassand/) print \$2}' | xargs sudo kill -9"
>
> echo "insert $rows rows"
> cqlsh $node1_ip -f init_data.cql > insert_log 2>&1
>
> echo "restart C* process on node3"
> pdsh -l $node3_user -R ssh -w $node3_ip "sudo /etc/init.d/cassandra start=
"
>
> while true
> do
> echo "truncate the table again"
> cqlsh $node1_ip -e "truncate table testdb.testtbl"
> if [ $? -ne 0 ]; then
>     echo "truncating failed"
>     continue
> else
>     break
> fi
> done
>
> cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select
> count(*) from testdb.testtbl;"
> sleep 10
> cqlsh $node1_ip --request-timeout 3600 -e "consistency serial; select
> count(*) from testdb.testtbl;"
>
>
> =3D=3D=3D=3D [result] =3D=3D=3D=3D
> truncate the table
> kill C* process on node3
> insert 10000 rows
> restart C* process on node3
> 10.91.145.27: Starting Cassandra: OK
> truncate the table again
> <stdin>:1:TruncateError: Error during truncate: Cannot achieve consistenc=
y
> level ALL
> truncating failed
> truncate the table again
> <stdin>:1:TruncateError: Error during truncate: Cannot achieve consistenc=
y
> level ALL
> truncating failed
> truncate the table again
> <stdin>:1:TruncateError: Error during truncate: Cannot achieve consistenc=
y
> level ALL
> truncating failed
> truncate the table again
> <stdin>:1:TruncateError: Error during truncate: Cannot achieve consistenc=
y
> level ALL
> truncating failed
> truncate the table again
> <stdin>:1:TruncateError: Error during truncate: Cannot achieve consistenc=
y
> level ALL
> truncating failed
> truncate the table again
> <stdin>:1:TruncateError: Error during truncate: Cannot achieve consistenc=
y
> level ALL
> truncating failed
> truncate the table again
> Consistency level set to SERIAL.
>
>  count
> -------
>    300
>
> (1 rows)
>
> Warnings :
> Aggregation query used without partition key
>
> Consistency level set to SERIAL.
>
>  count
> -------
>   2304
>
> (1 rows)
>
> Warnings :
> Aggregation query used without partition key
> =3D=3D=3D=3D
>
> I found it when I was investigating data lost problem. (Ref. "failure nod=
e
> rejoin" thread)
> I'm not sure this problem is related to data lost.
>
> Thanks,
> yuji
>
>
>
>

--001a114a7db81539b205422e228a
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Nice detective work! Seems to me that it=E2=80=99s a best =
an undocumented limitation and potentially could be viewed as a bug - maybe=
 log another JIRA?<div><br></div><div>One node - there is a nodetool trunca=
tehints command that could be used to clear out the hints (<a href=3D"http:=
//cassandra.apache.org/doc/latest/tools/nodetool/truncatehints.html?highlig=
ht=3Dtruncate">http://cassandra.apache.org/doc/latest/tools/nodetool/trunca=
tehints.html?highlight=3Dtruncate</a>)=C2=A0. However, it seems to clear al=
l hints on particular endpoint, not just for a specific table.</div><div><b=
r></div><div>Cheers</div><div>Ben</div></div><br><div class=3D"gmail_quote"=
><div dir=3D"ltr">On Fri, 25 Nov 2016 at 17:42 Yuji Ito &lt;<a href=3D"mail=
to:yuji@imagine-orb.com">yuji@imagine-orb.com</a>&gt; wrote:<br></div><bloc=
kquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #cc=
c solid;padding-left:1ex"><div dir=3D"ltr" class=3D"gmail_msg">Hi all,<div =
class=3D"gmail_msg"><br class=3D"gmail_msg"></div><div class=3D"gmail_msg">=
I revised the script to reproduce the issue.</div><div class=3D"gmail_msg">=
I think the issue happens more frequently=C2=A0than before.</div><div class=
=3D"gmail_msg">Killing another node is added to the previous script.</div><=
div style=3D"font-size:12.8px" class=3D"gmail_msg"><br class=3D"gmail_msg">=
</div><div class=3D"gmail_msg"><span style=3D"font-size:12.8px" class=3D"gm=
ail_msg">=3D=3D=3D=3D [script] =3D=3D=3D=3D</span><br class=3D"gmail_msg"><=
/div><div class=3D"gmail_msg"><div class=3D"gmail_msg"><span style=3D"font-=
size:12.8px" class=3D"gmail_msg">#!/bin/sh</span></div><div class=3D"gmail_=
msg"><span style=3D"font-size:12.8px" class=3D"gmail_msg"><br class=3D"gmai=
l_msg"></span></div><div class=3D"gmail_msg"></div></div></div><div dir=3D"=
ltr" class=3D"gmail_msg"><div class=3D"gmail_msg"><div class=3D"gmail_msg">=
<div style=3D"font-size:12.8px" class=3D"gmail_msg">node1_ip=3D&lt;node1 IP=
 address&gt;</div><div style=3D"font-size:12.8px" class=3D"gmail_msg">node2=
_ip=3D&lt;node2 IP address&gt;</div><div style=3D"font-size:12.8px" class=
=3D"gmail_msg">node3_ip=3D&lt;node3 IP address&gt;</div></div></div></div><=
div dir=3D"ltr" class=3D"gmail_msg"><div class=3D"gmail_msg"><div class=3D"=
gmail_msg"><div style=3D"font-size:12.8px" class=3D"gmail_msg"><span style=
=3D"font-size:12.8px" class=3D"gmail_msg">node2_user=3D&lt;user name&gt;</s=
pan><br class=3D"gmail_msg"></div><div style=3D"font-size:12.8px" class=3D"=
gmail_msg">node3_user=3D&lt;user name&gt;</div></div></div></div><div dir=
=3D"ltr" class=3D"gmail_msg"><div class=3D"gmail_msg"><div class=3D"gmail_m=
sg"><span style=3D"font-size:12.8px" class=3D"gmail_msg">rows=3D10000</span=
></div><div class=3D"gmail_msg"><span style=3D"font-size:12.8px" class=3D"g=
mail_msg"><br class=3D"gmail_msg"></span></div><div class=3D"gmail_msg"><sp=
an style=3D"font-size:12.8px" class=3D"gmail_msg">echo &quot;consistency qu=
orum;&quot; &gt; init_data.cql</span></div><div class=3D"gmail_msg"><span s=
tyle=3D"font-size:12.8px" class=3D"gmail_msg">for key in $(seq 0 $(expr $ro=
ws - 1))</span></div><div class=3D"gmail_msg"><span style=3D"font-size:12.8=
px" class=3D"gmail_msg">do</span></div><div class=3D"gmail_msg"><span style=
=3D"font-size:12.8px" class=3D"gmail_msg">=C2=A0 =C2=A0 echo &quot;insert i=
nto testdb.testtbl (key, val) values($key, 1111) IF NOT EXISTS;&quot; &gt;&=
gt; init_data.cql</span></div><div class=3D"gmail_msg"><span style=3D"font-=
size:12.8px" class=3D"gmail_msg">=C2=A0 =C2=A0 done</span></div><div class=
=3D"gmail_msg"><span style=3D"font-size:12.8px" class=3D"gmail_msg"><br cla=
ss=3D"gmail_msg"></span></div><div class=3D"gmail_msg"><span style=3D"font-=
size:12.8px" class=3D"gmail_msg">=C2=A0 =C2=A0 while true</span></div><div =
class=3D"gmail_msg"><span style=3D"font-size:12.8px" class=3D"gmail_msg">=
=C2=A0 =C2=A0 do</span></div><div class=3D"gmail_msg"><span style=3D"font-s=
ize:12.8px" class=3D"gmail_msg">=C2=A0 =C2=A0 echo &quot;truncate the table=
&quot;</span></div></div></div><div dir=3D"ltr" class=3D"gmail_msg"><div cl=
ass=3D"gmail_msg"><div class=3D"gmail_msg"><span style=3D"font-size:12.8px"=
 class=3D"gmail_msg">=C2=A0 =C2=A0 cqlsh $node1_ip -e &quot;truncate table =
testdb.testtbl&quot; &gt; /dev/null 2&gt;&amp;1</span></div></div></div><di=
v dir=3D"ltr" class=3D"gmail_msg"><div class=3D"gmail_msg"><div class=3D"gm=
ail_msg"><span style=3D"font-size:12.8px" class=3D"gmail_msg">=C2=A0 =C2=A0=
 if [ $? -ne 0 ]; then</span></div><div class=3D"gmail_msg"><span style=3D"=
font-size:12.8px" class=3D"gmail_msg">=C2=A0 =C2=A0 =C2=A0 =C2=A0 echo &quo=
t;truncating failed&quot;</span></div><div class=3D"gmail_msg"><span style=
=3D"font-size:12.8px" class=3D"gmail_msg">=C2=A0 =C2=A0 continue</span></di=
v><div class=3D"gmail_msg"><span style=3D"font-size:12.8px" class=3D"gmail_=
msg">=C2=A0 =C2=A0 else</span></div><div class=3D"gmail_msg"><span style=3D=
"font-size:12.8px" class=3D"gmail_msg">=C2=A0 =C2=A0 =C2=A0 =C2=A0 break</s=
pan></div><div class=3D"gmail_msg"><span style=3D"font-size:12.8px" class=
=3D"gmail_msg">=C2=A0 =C2=A0 fi</span></div><div class=3D"gmail_msg"><span =
style=3D"font-size:12.8px" class=3D"gmail_msg">done</span></div><div class=
=3D"gmail_msg"><span style=3D"font-size:12.8px" class=3D"gmail_msg"><br cla=
ss=3D"gmail_msg"></span></div><div class=3D"gmail_msg"><span style=3D"font-=
size:12.8px" class=3D"gmail_msg">echo &quot;kill C* process on node3&quot;<=
/span></div><div class=3D"gmail_msg"><span style=3D"font-size:12.8px" class=
=3D"gmail_msg">pdsh -l $node3_user -R ssh -w $node3_ip &quot;ps auxww | gre=
p CassandraDaemon | awk &#39;{if (\$13 ~ /cassand/) print \$2}&#39; | xargs=
 sudo kill -9&quot;</span></div><div class=3D"gmail_msg"><span style=3D"fon=
t-size:12.8px" class=3D"gmail_msg"><br class=3D"gmail_msg"></span></div><di=
v class=3D"gmail_msg"><span style=3D"font-size:12.8px" class=3D"gmail_msg">=
echo &quot;insert $rows rows&quot;</span></div><div class=3D"gmail_msg"><sp=
an style=3D"font-size:12.8px" class=3D"gmail_msg">cqlsh $node1_ip -f init_d=
ata.cql &gt; insert_log 2&gt;&amp;1</span></div><div class=3D"gmail_msg"><s=
pan style=3D"font-size:12.8px" class=3D"gmail_msg"><br class=3D"gmail_msg">=
</span></div><div class=3D"gmail_msg"><span style=3D"font-size:12.8px" clas=
s=3D"gmail_msg">echo &quot;restart C* process on node3&quot;</span></div><d=
iv class=3D"gmail_msg"><span style=3D"font-size:12.8px" class=3D"gmail_msg"=
>pdsh -l $node3_user -R ssh -w $node3_ip &quot;sudo /etc/init.d/cassandra s=
tart&quot;</span></div><div class=3D"gmail_msg"><span style=3D"font-size:12=
.8px" class=3D"gmail_msg"><br class=3D"gmail_msg"></span></div><div class=
=3D"gmail_msg"><span style=3D"font-size:12.8px" class=3D"gmail_msg">while t=
rue</span></div><div class=3D"gmail_msg"><span style=3D"font-size:12.8px" c=
lass=3D"gmail_msg">do</span></div><div class=3D"gmail_msg"><span style=3D"f=
ont-size:12.8px" class=3D"gmail_msg">echo &quot;truncate the table again&qu=
ot;</span></div><div class=3D"gmail_msg"><span style=3D"font-size:12.8px" c=
lass=3D"gmail_msg">cqlsh $node1_ip -e &quot;truncate table testdb.testtbl&q=
uot;</span></div><div class=3D"gmail_msg"><span style=3D"font-size:12.8px" =
class=3D"gmail_msg">if [ $? -ne 0 ]; then</span></div><div class=3D"gmail_m=
sg"><span style=3D"font-size:12.8px" class=3D"gmail_msg">=C2=A0 =C2=A0 echo=
 &quot;truncating failed&quot;</span></div><div class=3D"gmail_msg"><span s=
tyle=3D"font-size:12.8px" class=3D"gmail_msg">=C2=A0 =C2=A0 =C2=A0 =C2=A0 c=
ontinue</span></div><div class=3D"gmail_msg"><span style=3D"font-size:12.8p=
x" class=3D"gmail_msg">else</span></div></div></div><div dir=3D"ltr" class=
=3D"gmail_msg"><div class=3D"gmail_msg"><div class=3D"gmail_msg"><span styl=
e=3D"font-size:12.8px" class=3D"gmail_msg">=C2=A0 =C2=A0 echo &quot;truncat=
ion succeeded!&quot;</span></div><div class=3D"gmail_msg"><span style=3D"fo=
nt-size:12.8px" class=3D"gmail_msg">=C2=A0 =C2=A0 break</span></div><div cl=
ass=3D"gmail_msg"><span style=3D"font-size:12.8px" class=3D"gmail_msg">fi</=
span></div><div class=3D"gmail_msg"><span style=3D"font-size:12.8px" class=
=3D"gmail_msg">done</span></div><div class=3D"gmail_msg"><span style=3D"fon=
t-size:12.8px" class=3D"gmail_msg"><br class=3D"gmail_msg"></span></div><di=
v class=3D"gmail_msg"><span style=3D"font-size:12.8px" class=3D"gmail_msg">=
echo &quot;kill C* process on node2&quot;</span></div><div class=3D"gmail_m=
sg"><span style=3D"font-size:12.8px" class=3D"gmail_msg">pdsh -l $node2_use=
r -R ssh -w $node2_ip &quot;ps auxww | grep CassandraDaemon | awk &#39;{if =
(\$13 ~ /cassand/) print \$2}&#39; | xargs sudo kill -9&quot;</span></div><=
/div></div><div dir=3D"ltr" class=3D"gmail_msg"><div class=3D"gmail_msg"><d=
iv class=3D"gmail_msg"><span style=3D"font-size:12.8px" class=3D"gmail_msg"=
><br class=3D"gmail_msg"></span></div><div class=3D"gmail_msg"><span style=
=3D"font-size:12.8px" class=3D"gmail_msg">cqlsh $node1_ip --request-timeout=
 3600 -e &quot;consistency serial; select count(*) from testdb.testtbl;&quo=
t;</span></div><div class=3D"gmail_msg"><span style=3D"font-size:12.8px" cl=
ass=3D"gmail_msg">sleep 10</span></div><div class=3D"gmail_msg"><span style=
=3D"font-size:12.8px" class=3D"gmail_msg">cqlsh $node1_ip --request-timeout=
 3600 -e &quot;consistency serial; select count(*) from testdb.testtbl;&quo=
t;</span></div><div class=3D"gmail_msg"><span style=3D"font-size:12.8px" cl=
ass=3D"gmail_msg"><br class=3D"gmail_msg"></span></div></div></div><div dir=
=3D"ltr" class=3D"gmail_msg"><div class=3D"gmail_msg"><div class=3D"gmail_m=
sg"><span style=3D"font-size:12.8px" class=3D"gmail_msg">echo &quot;restart=
 C* process on node2&quot;</span></div><div class=3D"gmail_msg"><span style=
=3D"font-size:12.8px" class=3D"gmail_msg">pdsh -l $node2_user -R ssh -w $no=
de2_ip &quot;sudo /etc/init.d/cassandra start&quot;</span></div><div style=
=3D"font-size:12.8px" class=3D"gmail_msg"><br class=3D"gmail_msg"></div></d=
iv><div style=3D"font-size:12.8px" class=3D"gmail_msg"><br class=3D"gmail_m=
sg"></div><div class=3D"gmail_msg"><div style=3D"font-size:12.8px" class=3D=
"gmail_msg">Thanks,</div><div style=3D"font-size:12.8px" class=3D"gmail_msg=
">yuji</div></div><div style=3D"font-size:12.8px" class=3D"gmail_msg"><br c=
lass=3D"gmail_msg"></div></div><div class=3D"gmail_extra gmail_msg"><br cla=
ss=3D"gmail_msg"><div class=3D"gmail_quote gmail_msg">On Fri, Nov 18, 2016 =
at 7:52 PM, Yuji Ito <span dir=3D"ltr" class=3D"gmail_msg">&lt;<a href=3D"m=
ailto:yuji@imagine-orb.com" class=3D"gmail_msg" target=3D"_blank">yuji@imag=
ine-orb.com</a>&gt;</span> wrote:<br class=3D"gmail_msg"><blockquote class=
=3D"gmail_quote gmail_msg" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc =
solid;padding-left:1ex"><div dir=3D"ltr" class=3D"gmail_msg"><div class=3D"=
gmail_msg">I investigated source code and logs of killed node.</div><div cl=
ass=3D"gmail_msg">I guess that unexpected writes are executed when truncati=
on is being executed.</div><div class=3D"gmail_msg"><br class=3D"gmail_msg"=
></div><div class=3D"gmail_msg">Some writes were executed after flush (the =
first flush) in truncation and these writes could be read.</div><div class=
=3D"gmail_msg">These writes were requested as MUTATION by another node for =
hinted handoff.</div><div class=3D"gmail_msg">Their data was stored to a ne=
w memtable and flushed (the second flush) to a new SSTable before snapshot =
in truncation.</div><div class=3D"gmail_msg">So, the truncation discarded o=
nly old SSTables, not the new SSTable.</div><div class=3D"gmail_msg">That&#=
39;s because ReplayPosition which was used for discarding SSTable was that =
of the first flush.</div><div class=3D"gmail_msg"><br class=3D"gmail_msg"><=
/div><div class=3D"gmail_msg">I copied some parts of log as below.</div><di=
v class=3D"gmail_msg">&quot;##&quot; line is my comment.</div><div class=3D=
"gmail_msg">The point is that the ReplayPosition is moved forward by the se=
cond flush.</div><div class=3D"gmail_msg">It means some writes are executed=
 after the first flush.<br class=3D"gmail_msg"></div><div class=3D"gmail_ms=
g"><br class=3D"gmail_msg"></div><div class=3D"gmail_msg">=3D=3D log =3D=3D=
</div><div class=3D"gmail_msg">## started truncation</div><div class=3D"gma=
il_msg"><div class=3D"gmail_msg">TRACE [SharedPool-Worker-16] 2016-11-17 08=
:36:04,612 ColumnFamilyStore.java:2790 - truncating testtbl</div><div class=
=3D"gmail_msg">## the first flush started before truncation</div><div class=
=3D"gmail_msg">DEBUG [SharedPool-Worker-16] 2016-11-17 08:36:04,612 ColumnF=
amilyStore.java:952 - Enqueuing flush of testtbl: 591360 (0%) on-heap, 0 (0=
%) off-heap</div><div class=3D"gmail_msg">INFO =C2=A0[MemtableFlushWriter:1=
] 2016-11-17 08:36:04,613 Memtable.java:352 - Writing Memtable-testtbl@1863=
835308(42.625KiB serialized bytes, 2816 ops, 0%/0% of on/off-heap limit)</d=
iv><div class=3D"gmail_msg">...</div><div class=3D"gmail_msg">DEBUG [Memtab=
leFlushWriter:1] 2016-11-17 08:36:04,973 Memtable.java:386 - Completed flus=
hing /var/lib/cassandra/data/testdb/testtbl-562848f0a55611e68b1451065d58fdf=
b/tmp-lb-1-big-Data.db (17.651KiB) for commitlog position ReplayPosition(se=
gmentId=3D1479371760395, position=3D315867)</div><div class=3D"gmail_msg">#=
# this ReplayPosition was used for discarding SSTables</div><div class=3D"g=
mail_msg">...</div><div class=3D"gmail_msg">TRACE [MemtablePostFlush:1] 201=
6-11-17 08:36:05,022 CommitLog.java:298 - discard completed log segments fo=
r ReplayPosition(segmentId=3D1479371760395, position=3D315867), table 56284=
8f0-a556-11e6-8b14-51065d58fdfb</div><div class=3D"gmail_msg">## end of the=
 first flush</div><div class=3D"gmail_msg">DEBUG [SharedPool-Worker-16] 201=
6-11-17 08:36:05,028 ColumnFamilyStore.java:2823 - Discarding sstable data =
for truncated CF + indexes</div><div class=3D"gmail_msg">## the second flus=
h before snapshot<br class=3D"gmail_msg"></div><div class=3D"gmail_msg">DEB=
UG [SharedPool-Worker-16] 2016-11-17 08:36:05,028 ColumnFamilyStore.java:95=
2 - Enqueuing flush of testtbl: 698880 (0%) on-heap, 0 (0%) off-heap</div><=
div class=3D"gmail_msg">INFO =C2=A0[MemtableFlushWriter:2] 2016-11-17 08:36=
:05,029 Memtable.java:352 - Writing Memtable-testtbl@1186728207(50.375KiB s=
erialized bytes, 3328 ops, 0%/0% of on/off-heap limit)</div><div class=3D"g=
mail_msg">...</div><div class=3D"gmail_msg">DEBUG [MemtableFlushWriter:2] 2=
016-11-17 08:36:05,258 Memtable.java:386 - Completed flushing /var/lib/cass=
andra/data/testdb/testtbl-562848f0a55611e68b1451065d58fdfb/tmp-lb-2-big-Dat=
a.db (17.696KiB) for commitlog position ReplayPosition(segmentId=3D14793717=
60395, position=3D486627)</div><div class=3D"gmail_msg">...</div><div class=
=3D"gmail_msg">TRACE [MemtablePostFlush:1] 2016-11-17 08:36:05,289 CommitLo=
g.java:298 - discard completed log segments for ReplayPosition(segmentId=3D=
1479371760395, position=3D486627), table 562848f0-a556-11e6-8b14-51065d58fd=
fb</div><div class=3D"gmail_msg">## end of the second flush: position was m=
oved</div><div class=3D"gmail_msg">...</div><div class=3D"gmail_msg">## onl=
y old SSTable was deleted because this SSTable was older than ReplayPositio=
n(segmentId=3D1479371760395, position=3D315867)</div><div class=3D"gmail_ms=
g">TRACE [NonPeriodicTasks:1] 2016-11-17 08:36:05,303 SSTable.java:118 - De=
leted /var/lib/cassandra/data/testdb/testtbl-562848f0a55611e68b1451065d58fd=
fb/lb-1-big</div><div class=3D"gmail_msg">...</div><div class=3D"gmail_msg"=
>TRACE [SharedPool-Worker-16] 2016-11-17 08:36:05,320 ColumnFamilyStore.jav=
a:2841 - truncate complete</div><div class=3D"gmail_msg">TRACE [SharedPool-=
Worker-16] 2016-11-17 08:36:05,320 TruncateVerbHandler.java:53 - Truncation=
(keyspace=3D&#39;testdb&#39;, cf=3D&#39;testtbl&#39;) applied.=C2=A0 Enqueu=
ing response to 36512@/<a href=3D"http://10.91.145.7" class=3D"gmail_msg" t=
arget=3D"_blank">10.91.145.7</a></div><div class=3D"gmail_msg">TRACE [Share=
dPool-Worker-16] 2016-11-17 08:36:05,320 MessagingService.java:728 - /<a hr=
ef=3D"http://10.91.145.27" class=3D"gmail_msg" target=3D"_blank">10.91.145.=
27</a> sending REQUEST_RESPONSE to 36512@/<a href=3D"http://10.91.145.7" cl=
ass=3D"gmail_msg" target=3D"_blank">10.91.145.7</a></div><div class=3D"gmai=
l_msg">## end of truncation</div></div><div class=3D"gmail_msg">=3D=3D=3D=
=3D</div><div class=3D"gmail_msg"><br class=3D"gmail_msg"></div><div class=
=3D"gmail_msg">Actually, &quot;truncated_at&quot; of the table on the syste=
m.local after running the script was 0x00000158716da30b0004d1db00000158716d=
b524.</div><div class=3D"gmail_msg"><div class=3D"gmail_msg">It means segme=
ntId=3D1479371760395, position=3D315867 truncated_at=3D1479371765028 (2016-=
11-17 08:36:05,028)</div></div><div class=3D"gmail_msg"><br class=3D"gmail_=
msg"></div><div class=3D"gmail_msg">thanks,</div><div class=3D"gmail_msg">y=
uji</div><div class=3D"gmail_msg"><br class=3D"gmail_msg"></div></div><div =
class=3D"m_-5517003596177982908HOEnZb gmail_msg"><div class=3D"m_-551700359=
6177982908h5 gmail_msg"><div class=3D"gmail_extra gmail_msg"><br class=3D"g=
mail_msg"><div class=3D"gmail_quote gmail_msg">On Wed, Nov 16, 2016 at 5:25=
 PM, Yuji Ito <span dir=3D"ltr" class=3D"gmail_msg">&lt;<a href=3D"mailto:y=
uji@imagine-orb.com" class=3D"gmail_msg" target=3D"_blank">yuji@imagine-orb=
.com</a>&gt;</span> wrote:<br class=3D"gmail_msg"><blockquote class=3D"gmai=
l_quote gmail_msg" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pa=
dding-left:1ex"><div dir=3D"ltr" class=3D"gmail_msg">Hi,<div class=3D"gmail=
_msg"><br class=3D"gmail_msg"></div><div class=3D"gmail_msg">I could find s=
tale data after truncating a table.</div><div class=3D"gmail_msg">It seems =
that truncating starts while recovery is being executed just after a node r=
estarts.<br class=3D"gmail_msg"></div><div class=3D"gmail_msg">After the tr=
uncating finishes, recovery still continues?</div><div class=3D"gmail_msg">=
Is it expected?</div><div class=3D"gmail_msg"><br class=3D"gmail_msg"></div=
><div class=3D"gmail_msg">I use C* 2.2.8 and can reproduce it as below.</di=
v><div class=3D"gmail_msg"><br class=3D"gmail_msg"></div><div class=3D"gmai=
l_msg">=3D=3D=3D=3D [create table] =3D=3D=3D=3D</div><div class=3D"gmail_ms=
g"><span style=3D"font-size:12.8px" class=3D"gmail_msg">cqlsh $ip -e &quot;=
drop keyspace testdb;&quot;</span><br class=3D"gmail_msg"></div><div class=
=3D"gmail_msg"><div style=3D"font-size:12.8px" class=3D"gmail_msg"><span st=
yle=3D"font-size:12.8px" class=3D"gmail_msg">cqlsh $ip -e &quot;CREATE KEYS=
PACE testdb WITH replication =3D {&#39;class&#39;: &#39;SimpleStrategy&#39;=
, &#39;replication_factor&#39;: &#39;3&#39;};&quot;</span></div><div style=
=3D"font-size:12.8px" class=3D"gmail_msg"><span style=3D"font-size:12.8px" =
class=3D"gmail_msg">cqlsh $ip -e &quot;CREATE TABLE testdb.testtbl (key int=
 PRIMARY KEY, val int);&quot;</span></div></div><div class=3D"gmail_msg"><b=
r class=3D"gmail_msg"></div><div class=3D"gmail_msg">=3D=3D=3D=3D [script] =
=3D=3D=3D=3D</div><div class=3D"gmail_msg">#!/bin/sh<br class=3D"gmail_msg"=
></div><div class=3D"gmail_msg"><div class=3D"gmail_msg"><br class=3D"gmail=
_msg"></div><div class=3D"gmail_msg">node1_ip=3D&lt;node1 IP address&gt;</d=
iv><div class=3D"gmail_msg">node2_ip=3D&lt;node2 IP address&gt;</div><div c=
lass=3D"gmail_msg">node3_ip=3D&lt;node3 IP address&gt;</div><div class=3D"g=
mail_msg">node3_user=3D&lt;user name&gt;</div><div class=3D"gmail_msg">rows=
=3D10000</div><div class=3D"gmail_msg"><br class=3D"gmail_msg"></div><div c=
lass=3D"gmail_msg">echo &quot;consistency quorum;&quot; &gt; init_data.cql<=
/div><div class=3D"gmail_msg">for key in $(seq 0 $(expr $rows - 1))</div><d=
iv class=3D"gmail_msg">do</div><div class=3D"gmail_msg">=C2=A0 =C2=A0 echo =
&quot;insert into testdb.testtbl (key, val) values($key, 1111) IF NOT EXIST=
S;&quot; &gt;&gt; init_data.cql</div><div class=3D"gmail_msg">done</div><di=
v class=3D"gmail_msg"><br class=3D"gmail_msg"></div><div class=3D"gmail_msg=
">while true</div><div class=3D"gmail_msg">do</div><div class=3D"gmail_msg"=
>echo &quot;truncate the table&quot;<br class=3D"gmail_msg"></div><div clas=
s=3D"gmail_msg">cqlsh $node1_ip -e &quot;truncate table testdb.testtbl&quot=
;</div><div class=3D"gmail_msg">if [ $? -ne 0 ]; then</div><div class=3D"gm=
ail_msg">=C2=A0 =C2=A0 echo &quot;truncating failed&quot;</div><div class=
=3D"gmail_msg">=C2=A0 =C2=A0 continue</div><div class=3D"gmail_msg">else</d=
iv><div class=3D"gmail_msg">=C2=A0 =C2=A0 break</div><div class=3D"gmail_ms=
g">fi</div><div class=3D"gmail_msg">done</div><div class=3D"gmail_msg"><br =
class=3D"gmail_msg"></div><div class=3D"gmail_msg">echo &quot;kill C* proce=
ss on node3&quot;</div><div class=3D"gmail_msg">pdsh -l $node3_user -R ssh =
-w $node3_ip &quot;ps auxww | grep CassandraDaemon | awk &#39;{if (\$13 ~ /=
cassand/) print \$2}&#39; | xargs sudo kill -9&quot;</div><div class=3D"gma=
il_msg"><br class=3D"gmail_msg"></div><div class=3D"gmail_msg">echo &quot;i=
nsert $rows rows&quot;</div><div class=3D"gmail_msg">cqlsh $node1_ip -f ini=
t_data.cql &gt; insert_log 2&gt;&amp;1</div><div class=3D"gmail_msg"><br cl=
ass=3D"gmail_msg"></div><div class=3D"gmail_msg">echo &quot;restart C* proc=
ess on node3&quot;</div><div class=3D"gmail_msg">pdsh -l $node3_user -R ssh=
 -w $node3_ip &quot;sudo /etc/init.d/cassandra start&quot;</div><div class=
=3D"gmail_msg"><br class=3D"gmail_msg"></div><div class=3D"gmail_msg">while=
 true</div><div class=3D"gmail_msg">do</div><div class=3D"gmail_msg">echo &=
quot;truncate the table again&quot;</div><div class=3D"gmail_msg">cqlsh $no=
de1_ip -e &quot;truncate table testdb.testtbl&quot;</div><div class=3D"gmai=
l_msg">if [ $? -ne 0 ]; then</div><div class=3D"gmail_msg">=C2=A0 =C2=A0 ec=
ho &quot;truncating failed&quot;</div><div class=3D"gmail_msg">=C2=A0 =C2=
=A0 continue</div><div class=3D"gmail_msg">else</div><div class=3D"gmail_ms=
g">=C2=A0 =C2=A0 break</div><div class=3D"gmail_msg">fi</div><div class=3D"=
gmail_msg">done</div><div class=3D"gmail_msg"><br class=3D"gmail_msg"></div=
><div class=3D"gmail_msg">cqlsh $node1_ip --request-timeout 3600 -e &quot;c=
onsistency serial; select count(*) from testdb.testtbl;&quot;</div><div cla=
ss=3D"gmail_msg">sleep 10</div><div class=3D"gmail_msg">cqlsh $node1_ip --r=
equest-timeout 3600 -e &quot;consistency serial; select count(*) from testd=
b.testtbl;&quot;</div></div><div class=3D"gmail_msg"><br class=3D"gmail_msg=
"></div><div class=3D"gmail_msg"><br class=3D"gmail_msg"></div><div class=
=3D"gmail_msg">=3D=3D=3D=3D [result] =3D=3D=3D=3D</div><div class=3D"gmail_=
msg"><div class=3D"gmail_msg">truncate the table</div><div class=3D"gmail_m=
sg">kill C* process on node3</div><div class=3D"gmail_msg">insert 10000 row=
s</div><div class=3D"gmail_msg">restart C* process on node3</div><div class=
=3D"gmail_msg"><a href=3D"http://10.91.145.27" class=3D"gmail_msg" target=
=3D"_blank">10.91.145.27</a>: Starting Cassandra: OK</div><div class=3D"gma=
il_msg">truncate the table again</div><div class=3D"gmail_msg">&lt;stdin&gt=
;:1:TruncateError: Error during truncate: Cannot achieve consistency level =
ALL</div><div class=3D"gmail_msg">truncating failed</div><div class=3D"gmai=
l_msg">truncate the table again</div><div class=3D"gmail_msg">&lt;stdin&gt;=
:1:TruncateError: Error during truncate: Cannot achieve consistency level A=
LL</div><div class=3D"gmail_msg">truncating failed</div><div class=3D"gmail=
_msg">truncate the table again</div><div class=3D"gmail_msg">&lt;stdin&gt;:=
1:TruncateError: Error during truncate: Cannot achieve consistency level AL=
L</div><div class=3D"gmail_msg">truncating failed</div><div class=3D"gmail_=
msg">truncate the table again</div><div class=3D"gmail_msg">&lt;stdin&gt;:1=
:TruncateError: Error during truncate: Cannot achieve consistency level ALL=
</div><div class=3D"gmail_msg">truncating failed</div><div class=3D"gmail_m=
sg">truncate the table again</div><div class=3D"gmail_msg">&lt;stdin&gt;:1:=
TruncateError: Error during truncate: Cannot achieve consistency level ALL<=
/div><div class=3D"gmail_msg">truncating failed</div><div class=3D"gmail_ms=
g">truncate the table again</div><div class=3D"gmail_msg">&lt;stdin&gt;:1:T=
runcateError: Error during truncate: Cannot achieve consistency level ALL</=
div><div class=3D"gmail_msg">truncating failed</div><div class=3D"gmail_msg=
">truncate the table again</div><div class=3D"gmail_msg">Consistency level =
set to SERIAL.</div><div class=3D"gmail_msg"><br class=3D"gmail_msg"></div>=
<div class=3D"gmail_msg">=C2=A0count</div><div class=3D"gmail_msg">-------<=
/div><div class=3D"gmail_msg">=C2=A0 =C2=A0300</div><div class=3D"gmail_msg=
"><br class=3D"gmail_msg"></div><div class=3D"gmail_msg">(1 rows)</div><div=
 class=3D"gmail_msg"><br class=3D"gmail_msg"></div><div class=3D"gmail_msg"=
>Warnings :</div><div class=3D"gmail_msg">Aggregation query used without pa=
rtition key</div><div class=3D"gmail_msg"><br class=3D"gmail_msg"></div><di=
v class=3D"gmail_msg">Consistency level set to SERIAL.</div><div class=3D"g=
mail_msg"><br class=3D"gmail_msg"></div><div class=3D"gmail_msg">=C2=A0coun=
t</div><div class=3D"gmail_msg">-------</div><div class=3D"gmail_msg">=C2=
=A0 2304</div><div class=3D"gmail_msg"><br class=3D"gmail_msg"></div><div c=
lass=3D"gmail_msg">(1 rows)</div><div class=3D"gmail_msg"><br class=3D"gmai=
l_msg"></div><div class=3D"gmail_msg">Warnings :</div><div class=3D"gmail_m=
sg">Aggregation query used without partition key</div></div><div class=3D"g=
mail_msg">=3D=3D=3D=3D</div><div class=3D"gmail_msg"><br class=3D"gmail_msg=
"></div><div class=3D"gmail_msg">I found it when I was investigating data l=
ost problem. (Ref. &quot;failure node rejoin&quot; thread)<br class=3D"gmai=
l_msg"></div><div class=3D"gmail_msg">I&#39;m not sure this problem is rela=
ted to data lost.<br class=3D"gmail_msg"></div><div class=3D"gmail_msg"><br=
 class=3D"gmail_msg"></div><div class=3D"gmail_msg">Thanks,</div><div class=
=3D"gmail_msg">yuji</div></div>
</blockquote></div><br class=3D"gmail_msg"></div>
</div></div></blockquote></div><br class=3D"gmail_msg"></div>
</blockquote></div>

--001a114a7db81539b205422e228a--