cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roland Gude <>
Subject AW: Data ends up in wrong Columnfamily
Date Fri, 11 Feb 2011 09:17:22 GMT

machine A has absolutely no knowledge about the anything about the other application. Not
even the columnfamily name.
I was digging into this further:

Since the data I find in the wrong space has a timestamp in its row key It was quite easy
to find out that the data was relatively old. Unfortunately from a time where I do not have
batch mutation logs from the server side.
I think this might be related to the “deleted columns reappear” thread, as I saw the following

·         I truncated the columnfamily that contained wrong data using the Cassandra-cli.

·         I regenerated the correct data for that columnfamily

·         I ran repair on a node in the cluster

·         -> The data reappeared

I tried this multiple times. And even tried to truncate the columnfamily using clustertool
on the slight  chance that it does something different than the cli when truncating. But up
to the moment I have not been successful in removing the data from the cluster.
Another strange thing about the issue is, that repair seems to blow up the data indefinetly.
The columnfamily that contains wrong data contains around 200Kb of correct data before I repair.
The complete cluster contains around 6Gb of data ( 3 nodes 3Gb each replication factor 2).
After repair on one node, that node contains about 14GB of data. If I trigger a repair now
on the second node, It gets to around 24Gb of data before it falls to OOM.
Getting to 24Gb of data seems to be impossible to me from the amount of data I have written
to the cluster. I can only imagine that it is data that was once deleted but keeps reappering
and while doing so, it reappears in the wrong place.
Note that the columnfamily that contains the wrong data did not even exist when the data was
first written (It was created with the cli only a couple of days ago, while the oldest row
I could find that was not supposed to exist was from January 7th)

We did fail to run repair regulary on that cluster in the meantime.

If I find a BatchMutation log that indicates an incorrect write received by the server, I
will post it.


Von: Aaron Morton []
Gesendet: Donnerstag, 10. Februar 2011 21:37
Betreff: Re: Data ends up in wrong Columnfamily

Not heard of that before, chances are it's a problem in your code. Does machine A even know
the other CF name? Can you log the batch mutations you are sending? When it appears in the
other CF is the data complete?

There is also a Hector list, perhaps they can help.


On 10/02/2011, at 11:58 PM, Roland Gude <<>>

i am experiencing a strange issue. I have two applications writing to Cassandra (in different
Column families in the same keyspace). The applications reside on different machines and know
nothing about the existence of each other.
The both produce data and write it in Cassandra with batch mutations using hector.
So far so good, but it regularly happens, that data from one application ends up in columnfamilies
reserved for the other application as well as the intended columnfamily.

Machine A writes to column family CF_A
Machine B writes to column families CF_B to CF_N

Regularly data that was written (According to my application logs) from Machine A to CF_A
ends up in CF_A and in one of the other columnfamilies.

Any ideas why this could be happening?

I am using Cassandra 0.7.0 and hector 0.7.0-23



Roland Gude
Software Engineer

Im Mediapark 8, 50670 Köln

+49 221 4544151 (Tel)
+49 221 4544159 (Fax)
+49 171 7894057 (Mobil)


Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann
Handelsregister: Amtsgericht Köln HRB 65275
Ust-Ident-Nr: DE 264 773 520
Sitz der Gesellschaft: Köln

View raw message