incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Stanhope <>
Subject Re: Possible bug in Cassandra MapReduce
Date Fri, 18 Jun 2010 22:11:36 GMT
"blow all the data away" ... how do you do that? What is the timestamp precision that you are
using when creating key/col or key/supercol/col items?

I have seen a fail to write a key when the timestamp is identical to the previous timestamp
of a deleted key/col. While I didn't examine the source code, I'm certain that this is do
to delete tombstones.

I view this as a application error because I was attempting to do this within the GCGraceSeconds
time period. If I, however, stopped cassandra, blew away data & commitlogs and restarted
the write always succeeds (no surprise there).

I turned this behavior into a feature (of sorts). When this happens I increment a formally
non-zero portion of the timestamp (the last digit of precision which was always zero) and
use this as a counter to track how many times a key/col was updated (max 9 for my purposes).


On Jun 18, 2010, at 5:49 PM, Corey Hulen wrote:

> We are using MapReduce to periodical verify and rebuild our secondary indexes along with
counting total records.  We started to noticed double counting of unique keys on single machine
standalone tests. We were finally able to reproduce the problem using the apache-cassandra-0.6.2-src/contrib/word_count
example and just re-running it multiple times.  We are hoping someone can verify the bug.
> re-run the tests and the word count for /tmp/word_count3/part-r-00000 will be 1000 +~200
 and will change if you blow the data away and re-run.  Notice the setup script loops and
only inserts 1000 records so we expect count to be 1000.  Once the data is generated then
re-running the setup script and/or mapreduce doesn't change the number (still off).  The key
is to blow all the data away and start over which will cause it to change.
> Can someone please verify this behavior?
> -Corey

View raw message