Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 30358 invoked from network); 19 Jun 2010 03:38:42 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 19 Jun 2010 03:38:42 -0000 Received: (qmail 25792 invoked by uid 500); 19 Jun 2010 03:38:41 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 25778 invoked by uid 500); 19 Jun 2010 03:38:38 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 25770 invoked by uid 99); 19 Jun 2010 03:38:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 19 Jun 2010 03:38:37 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=AWL,HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.160.172] (HELO mail-gy0-f172.google.com) (209.85.160.172) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 19 Jun 2010 03:38:33 +0000 Received: by gyh4 with SMTP id 4so1418191gyh.31 for ; Fri, 18 Jun 2010 20:38:11 -0700 (PDT) MIME-Version: 1.0 Received: by 10.150.118.3 with SMTP id q3mr1902682ybc.211.1276918691584; Fri, 18 Jun 2010 20:38:11 -0700 (PDT) Received: by 10.151.49.6 with HTTP; Fri, 18 Jun 2010 20:38:11 -0700 (PDT) In-Reply-To: References: Date: Fri, 18 Jun 2010 20:38:11 -0700 Message-ID: Subject: Re: Possible bug in Cassandra MapReduce From: Corey Hulen To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=000e0cd70dc204d948048959cdd0 --000e0cd70dc204d948048959cdd0 Content-Type: text/plain; charset=ISO-8859-1 Awesome...thanks. I just downloaded the patch and applied it and verified it fixes our problems. what's the ETA on 0.6.3? (debating on weather to tolerate it or maintain our own 0.6.2+patch). -Corey On Fri, Jun 18, 2010 at 8:21 PM, Jonathan Ellis wrote: > Fixed for 0.6.3: https://issues.apache.org/jira/browse/CASSANDRA-1042 > > On Fri, Jun 18, 2010 at 2:49 PM, Corey Hulen wrote: > > > > We are using MapReduce to periodical verify and rebuild our secondary > > indexes along with counting total records. We started to noticed double > > counting of unique keys on single machine standalone tests. We were > finally > > able to reproduce the problem using > > the apache-cassandra-0.6.2-src/contrib/word_count example and just > > re-running it multiple times. We are hoping someone can verify the bug. > > re-run the tests and the word count for /tmp/word_count3/part-r-00000 > will > > be 1000 +~200 and will change if you blow the data away and re-run. > Notice > > the setup script loops and only inserts 1000 records so we expect count > to > > be 1000. Once the data is generated then re-running the setup script > and/or > > mapreduce doesn't change the number (still off). The key is to blow all > the > > data away and start over which will cause it to change. > > Can someone please verify this behavior? > > -Corey > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com > --000e0cd70dc204d948048959cdd0 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Awesome...thanks.

I just downloaded the patch and applie= d it and verified it fixes our problems.

what'= s the ETA on 0.6.3? =A0(debating on weather to=A0tolerate=A0it or maintain = our own 0.6.2+patch).

-Corey

On Fri, Jun 18= , 2010 at 8:21 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
Fixed for 0.6.3: https://issues.apache.org/jira/browse/CASSANDRA-10= 42

On Fri, Jun 18, 2010 at 2:49 PM, Corey Hulen <cj@earnstone.com> wrote:
>
> We are using MapReduce to=A0periodical=A0verify=A0and rebuild our seco= ndary
> indexes along with counting total records. =A0We started to noticed do= uble
> counting of unique keys on single machine standalone tests. We were fi= nally
> able to reproduce the problem using
> the=A0apache-cassandra-0.6.2-src/contrib/word_count example and just > re-running it multiple times. =A0We are hoping someone can verify the = bug.
> re-run the tests and the word count for=A0/tmp/word_count3/part-r-0000= 0 will
> be 1000 +~200 =A0and will change if you blow the data away and re-run.= =A0Notice
> the setup script loops and only inserts 1000 records so we expect coun= t to
> be 1000. =A0Once the data is generated then re-running the setup scrip= t and/or
> mapreduce doesn't change the number (still off). =A0The key is to = blow all the
> data away and start over which will cause it to change.
> Can someone please verify this behavior?
> -Corey



--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

--000e0cd70dc204d948048959cdd0--