Just as a followup, here's what seems to be the resolution:
1. 0.6.4 should fix this problem.
2. Using OPP as the DHT should solve it as well.
3. Prior to 0.6.4, when using RandomPartitioner as the DHT, there's no good
way to guarantee that you see *all* row keys for a column family.
Strategies tried:
A. iterate over the keys returned until the "start_key" is identical to the
"last key returned". When start_key == last key returned, exit.
-> fails since duplicate keys can appear anywhere, even as the last key
returned.
B. iterate over keys returned, adding the keys to a hash table. When an
iteration returns no new keys, assume that all keys have been seen and exit.
-> this also fails, since a particular result set can be full of duplicates,
but the iteration has not traversed the entire row-key spectrum.
Dave Viner
On Wed, Jul 28, 2010 at 3:48 PM, Rob Coli <rcoli@digg.com> wrote:
> On 7/28/10 2:43 PM, Dave Viner wrote:
>
>> Hi all,
>>
>> I'm having a strange result in trying to iterate over all row keys for a
>> particular column family. The iteration works, but I see the same row
>> key returned multiple times during the iteration.
>>
>> I'm using cassandra 0.6.3, and I've put the code in use at
>>
>
> For those not playing along on IRC, this was determined to be caused by :
>
> http://issues.apache.org/jira/browse/CASSANDRA-1042
>
> Which is fixed in 0.6.4.
>
> =Rob
>
|