incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joost Ouwerkerk <jo...@openplaces.org>
Subject Re: Single Split ColumnFamilyRecordReader returns duplicate rows
Date Sat, 01 May 2010 16:19:03 GMT
Created CASSANDRA-1042.

On Sat, May 1, 2010 at 12:01 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
> Can you create a ticket?
>
> On Fri, Apr 30, 2010 at 4:55 PM, Joost Ouwerkerk <joost@openplaces.org> wrote:
>> There's a bug in ColumnFamilyRecordReader that appears when processing
>> a single split.  When the start and end tokens of the split are equal,
>> duplicate rows can be returned.
>>
>> Example with 5 rows:
>> token (start and end) = 53193025635115934196771903670925341736
>>
>> Tokens returned by first get_range_slices iteration:
>>  16955237001963240173058271559858726497
>>  40670782773005619916245995581909898190
>>  99079589977253916124855502156832923443
>>  144992942750327304334463589818972416113
>>  166860289390734216023086131251507064403
>>
>> Tokens returned by next iteration (first token is last token from
>> previous, end token is unchanged)
>>  16955237001963240173058271559858726497
>>  40670782773005619916245995581909898190
>>
>> Tokens returned by final iteration  (first token is last token from
>> previous, end token is unchanged)
>>  [] (empty)
>>
>> In this example, the mapper has processed 7 rows in total, 2 of which
>> were duplicates.
>>
>> Joost.
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Mime
View raw message