Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 59846 invoked from network); 1 May 2010 16:19:30 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 1 May 2010 16:19:30 -0000 Received: (qmail 36777 invoked by uid 500); 1 May 2010 16:19:29 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 36759 invoked by uid 500); 1 May 2010 16:19:29 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 36751 invoked by uid 99); 1 May 2010 16:19:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 01 May 2010 16:19:29 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=AWL,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.211.200] (HELO mail-yw0-f200.google.com) (209.85.211.200) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 01 May 2010 16:19:25 +0000 Received: by ywh38 with SMTP id 38so605022ywh.29 for ; Sat, 01 May 2010 09:19:04 -0700 (PDT) MIME-Version: 1.0 Received: by 10.91.21.37 with SMTP id y37mr1303785agi.39.1272730743712; Sat, 01 May 2010 09:19:03 -0700 (PDT) Received: by 10.90.55.19 with HTTP; Sat, 1 May 2010 09:19:03 -0700 (PDT) In-Reply-To: References: Date: Sat, 1 May 2010 12:19:03 -0400 Message-ID: Subject: Re: Single Split ColumnFamilyRecordReader returns duplicate rows From: Joost Ouwerkerk To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Created CASSANDRA-1042. On Sat, May 1, 2010 at 12:01 AM, Jonathan Ellis wrote: > Can you create a ticket? > > On Fri, Apr 30, 2010 at 4:55 PM, Joost Ouwerkerk w= rote: >> There's a bug in ColumnFamilyRecordReader that appears when processing >> a single split. =A0When the start and end tokens of the split are equal, >> duplicate rows can be returned. >> >> Example with 5 rows: >> token (start and end) =3D 53193025635115934196771903670925341736 >> >> Tokens returned by first get_range_slices iteration: >> =A016955237001963240173058271559858726497 >> =A040670782773005619916245995581909898190 >> =A099079589977253916124855502156832923443 >> =A0144992942750327304334463589818972416113 >> =A0166860289390734216023086131251507064403 >> >> Tokens returned by next iteration (first token is last token from >> previous, end token is unchanged) >> =A016955237001963240173058271559858726497 >> =A040670782773005619916245995581909898190 >> >> Tokens returned by final iteration =A0(first token is last token from >> previous, end token is unchanged) >> =A0[] (empty) >> >> In this example, the mapper has processed 7 rows in total, 2 of which >> were duplicates. >> >> Joost. >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >