Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of fsobral@igcorp.com.br
 designates 209.85.160.176 as permitted sender)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\))
Subject: Possibly losing data with corrupted SSTables
From: Francisco Nogueira Calmon Sobral <fsobral@igcorp.com.br>
Resent-From: Francisco Nogueira Calmon Sobral <fsobral@igcorp.com.br>
Date: Tue, 28 Jan 2014 17:25:52 -0200
Content-Transfer-Encoding: quoted-printable
Resent-Date: Wed, 29 Jan 2014 11:11:03 -0200
Resent-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Message-Id: <74FE5FE8-7B0D-43F0-BDF1-3D6A6CF9D89B@igcorp.com.br>
To: thoth <thoth-l@igcorp.com.br>
Resent-Message-Id: <20140129131131.D88F1CDA@athena.apache.org>

Dear experts,

We are facing a annoying problem in our cluster.

We have 9 amazon extra large linux nodes, running Cassandra 1.2.11.

The short story is that after moving the data from one cluster to =
another, we've been unable to run 'nodetool repair'. It get stuck due to =
a CorruptSSTableException in some nodes and CFs. After looking at some =
problematic CFs, we observed that some of them have root permissions, =
instead of cassandra permissions. Also, their names are different from =
the 'good' ones as we can see below:

BAD
------
-rw-r--r-- 8 cassandra cassandra 991M Nov  8 15:11 =
Sessions-Users-ib-2516-Data.db
-rw-r--r-- 8 cassandra cassandra 703M Nov  8 15:11 =
Sessions-Users-ib-2516-Index.db
-rw-r--r-- 8 cassandra cassandra 5.3M Nov 13 11:42 =
Sessions-Users-ib-2516-Summary.db

GOOD
---------
-rw-r--r-- 1 cassandra cassandra  22K Jan 15 10:50 =
Sessions-Users-ic-2933-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 106M Jan 15 10:50 =
Sessions-Users-ic-2933-Data.db
-rw-r--r-- 1 cassandra cassandra 2.2M Jan 15 10:50 =
Sessions-Users-ic-2933-Filter.db
-rw-r--r-- 1 cassandra cassandra  76M Jan 15 10:50 =
Sessions-Users-ic-2933-Index.db
-rw-r--r-- 1 cassandra cassandra 4.3K Jan 15 10:50 =
Sessions-Users-ic-2933-Statistics.db
-rw-r--r-- 1 cassandra cassandra 574K Jan 15 10:50 =
Sessions-Users-ic-2933-Summary.db
-rw-r--r-- 1 cassandra cassandra   79 Jan 15 10:50 =
Sessions-Users-ic-2933-TOC.txt


We changed the permissions back to 'cassandra' and ran 'nodetool scrub' =
in this problematic CF, but it has been running for at least two weeks =
(it is not frozen) and keeps logging many WARNs while working with the =
above mentioned SSTable:

WARN [CompactionExecutor:15] 2014-01-28 17:01:22,571 OutputHandler.java =
(line 57) Non-fatal error reading row (stacktrace follows)
java.io.IOError: java.io.IOException: Impossible row size =
3618452438597849419
	at =
org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:171)
	at =
org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionMa=
nager.java:526)
	at =
org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionMan=
ager.java:515)
	at =
org.apache.cassandra.db.compaction.CompactionManager.access$400(Compaction=
Manager.java:70)
	at =
org.apache.cassandra.db.compaction.CompactionManager$3.perform(CompactionM=
anager.java:280)
	at =
org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionMana=
ger.java:250)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at =
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:=
1145)
	at =
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java=
:615)
	at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.IOException: Impossible row size 3618452438597849419
	... 10 more


1) I do not think that deleting all data of one node and running =
'nodetool rebuild' will work, since we observed that this problem occurs =
in all nodes. So we may not be able to restore all the data. What can be =
done in this case?

2) Why the permissions of some sstables are 'root'? Is this problem =
caused by our manual migration of data? (see long story below)


How we ran into this?

The long story is that we've tried to move our cluster with =
sstableloader, but it was unable to load all the data correctly. Our =
solution was to put ALL cluster data into EACH new node and run =
'nodetool refresh'. I performed this task for each node and each column =
family sequentially. Sometimes I had to rename some sstables, because =
they came from different nodes with the same name. I don't remember if I =
ran 'nodetool repair'  or even 'nodetool cleanup' in each node. =
Apparently, the process was successful, and (almost) all the data was =
moved.

Unfortunately, after 3 months since we moved, I am unable to perform =
read operations in some keys of some CFs. I think that some of these =
keys belong to the above mentioned sstables.=20

Any insights are welcome.

Best regards,
Francisco Sobral=