Return-Path: Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: (qmail 23049 invoked from network); 6 Jul 2010 20:29:46 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 6 Jul 2010 20:29:46 -0000 Received: (qmail 78209 invoked by uid 500); 6 Jul 2010 20:29:46 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 78196 invoked by uid 500); 6 Jul 2010 20:29:46 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 78187 invoked by uid 99); 6 Jul 2010 20:29:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Jul 2010 20:29:46 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Jul 2010 20:29:43 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o66KLq8E021348 for ; Tue, 6 Jul 2010 20:21:52 GMT Message-ID: <13981215.222351278447712272.JavaMail.jira@thor> Date: Tue, 6 Jul 2010 16:21:52 -0400 (EDT) From: "Jonathan Ellis (JIRA)" To: commits@cassandra.apache.org Subject: [jira] Commented: (CASSANDRA-1042) ColumnFamilyRecordReader returns duplicate rows In-Reply-To: <22188148.291272730736008.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CASSANDRA-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885670#action_12885670 ] Jonathan Ellis commented on CASSANDRA-1042: ------------------------------------------- the "correct" order when tokens are involved is ring order (when start_key is used instead of start_token, you can't have a wrapping range so it should be moot) > ColumnFamilyRecordReader returns duplicate rows > ----------------------------------------------- > > Key: CASSANDRA-1042 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1042 > Project: Cassandra > Issue Type: Bug > Components: Hadoop > Affects Versions: 0.6 > Reporter: Joost Ouwerkerk > Assignee: Jeremy Hanna > Fix For: 0.6.4 > > Attachments: 1042-0_6.txt, Cassandra-1042-0_6-branch.patch.txt, CASSANDRA-1042-trunk.patch.txt, cassandra.tar.gz, duplicate_keys.rtf > > > There's a bug in ColumnFamilyRecordReader that appears when processing a single split (which happens in most tests that have small number of rows), and potentially in other cases. When the start and end tokens of the split are equal, duplicate rows can be returned. > Example with 5 rows: > token (start and end) = 53193025635115934196771903670925341736 > Tokens returned by first get_range_slices iteration (all 5 rows): > 16955237001963240173058271559858726497 > 40670782773005619916245995581909898190 > 99079589977253916124855502156832923443 > 144992942750327304334463589818972416113 > 166860289390734216023086131251507064403 > Tokens returned by next iteration (first token is last token from > previous, end token is unchanged) > 16955237001963240173058271559858726497 > 40670782773005619916245995581909898190 > Tokens returned by final iteration (first token is last token from > previous, end token is unchanged) > [] (empty) > In this example, the mapper has processed 7 rows in total, 2 of which > were duplicates. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.