cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Viner <davevi...@pobox.com>
Subject Re: iterating over all rows keys gets duplicate key returns
Date Thu, 29 Jul 2010 00:23:17 GMT
Just as a followup, here's what seems to be the resolution:

1. 0.6.4 should fix this problem.
2. Using OPP as the DHT should solve it as well.
3. Prior to 0.6.4, when using RandomPartitioner as the DHT, there's no good
way to guarantee that you see *all* row keys for a column family.

Strategies tried:

A. iterate over the keys returned until the "start_key" is identical to the
"last key returned".  When start_key == last key returned, exit.
-> fails since duplicate keys can appear anywhere, even as the last key
returned.

B. iterate over keys returned, adding the keys to a hash table.  When an
iteration returns no new keys, assume that all keys have been seen and exit.
-> this also fails, since a particular result set can be full of duplicates,
but the iteration has not traversed the entire row-key spectrum.

Dave Viner

On Wed, Jul 28, 2010 at 3:48 PM, Rob Coli <rcoli@digg.com> wrote:

> On 7/28/10 2:43 PM, Dave Viner wrote:
>
>> Hi all,
>>
>> I'm having a strange result in trying to iterate over all row keys for a
>> particular column family.  The iteration works, but I see the same row
>> key returned multiple times during the iteration.
>>
>> I'm using cassandra 0.6.3, and I've put the code in use at
>>
>
> For those not playing along on IRC, this was determined to be caused by :
>
> http://issues.apache.org/jira/browse/CASSANDRA-1042
>
> Which is fixed in 0.6.4.
>
> =Rob
>

Mime
View raw message