incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Corey Hulen ...@earnstone.com>
Subject Re: Possible bug in Cassandra MapReduce
Date Fri, 18 Jun 2010 22:15:56 GMT
I thought the same thing, but using the supplied contrib example I just
delete the /var/lib/data dirs and commit log.

-Corey



On Fri, Jun 18, 2010 at 3:11 PM, Phil Stanhope <pstanhope@wimba.com> wrote:

> "blow all the data away" ... how do you do that? What is the timestamp
> precision that you are using when creating key/col or key/supercol/col
> items?
>
> I have seen a fail to write a key when the timestamp is identical to the
> previous timestamp of a deleted key/col. While I didn't examine the source
> code, I'm certain that this is do to delete tombstones.
>
> I view this as a application error because I was attempting to do this
> within the GCGraceSeconds time period. If I, however, stopped cassandra,
> blew away data & commitlogs and restarted the write always succeeds (no
> surprise there).
>
> I turned this behavior into a feature (of sorts). When this happens I
> increment a formally non-zero portion of the timestamp (the last digit of
> precision which was always zero) and use this as a counter to track how many
> times a key/col was updated (max 9 for my purposes).
>
> -phil
>
> On Jun 18, 2010, at 5:49 PM, Corey Hulen wrote:
>
> >
> > We are using MapReduce to periodical verify and rebuild our secondary
> indexes along with counting total records.  We started to noticed double
> counting of unique keys on single machine standalone tests. We were finally
> able to reproduce the problem using the
> apache-cassandra-0.6.2-src/contrib/word_count example and just re-running it
> multiple times.  We are hoping someone can verify the bug.
> >
> > re-run the tests and the word count for /tmp/word_count3/part-r-00000
> will be 1000 +~200  and will change if you blow the data away and re-run.
>  Notice the setup script loops and only inserts 1000 records so we expect
> count to be 1000.  Once the data is generated then re-running the setup
> script and/or mapreduce doesn't change the number (still off).  The key is
> to blow all the data away and start over which will cause it to change.
> >
> > Can someone please verify this behavior?
> >
> > -Corey
>
>

Mime
View raw message