incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Single Split ColumnFamilyRecordReader returns duplicate rows
Date Sat, 01 May 2010 04:01:49 GMT
Can you create a ticket?

On Fri, Apr 30, 2010 at 4:55 PM, Joost Ouwerkerk <joost@openplaces.org> wrote:
> There's a bug in ColumnFamilyRecordReader that appears when processing
> a single split.  When the start and end tokens of the split are equal,
> duplicate rows can be returned.
>
> Example with 5 rows:
> token (start and end) = 53193025635115934196771903670925341736
>
> Tokens returned by first get_range_slices iteration:
>  16955237001963240173058271559858726497
>  40670782773005619916245995581909898190
>  99079589977253916124855502156832923443
>  144992942750327304334463589818972416113
>  166860289390734216023086131251507064403
>
> Tokens returned by next iteration (first token is last token from
> previous, end token is unchanged)
>  16955237001963240173058271559858726497
>  40670782773005619916245995581909898190
>
> Tokens returned by final iteration  (first token is last token from
> previous, end token is unchanged)
>  [] (empty)
>
> In this example, the mapper has processed 7 rows in total, 2 of which
> were duplicates.
>
> Joost.
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Mime
View raw message