incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Corey Hulen>
Subject Possible bug in Cassandra MapReduce
Date Fri, 18 Jun 2010 21:49:24 GMT
We are using MapReduce to periodical verify and rebuild our secondary
indexes along with counting total records.  We started to noticed double
counting of unique keys on single machine standalone tests. We were finally
able to reproduce the problem using
the apache-cassandra-0.6.2-src/contrib/word_count example and just
re-running it multiple times.  We are hoping someone can verify the bug.

re-run the tests and the word count for /tmp/word_count3/part-r-00000 will
be 1000 +~200  and will change if you blow the data away and re-run.  Notice
the setup script loops and only inserts 1000 records so we expect count to
be 1000.  Once the data is generated then re-running the setup script and/or
mapreduce doesn't change the number (still off).  The key is to blow all the
data away and start over which will cause it to change.

Can someone please verify this behavior?


View raw message