Funny you should ask... I just went through the same exercise.

You must use Cassandra 0.6.4.  Otherwise you will get duplicate keys.  However, here is a snippet of perl that you can use.  

our $WANTED_COLUMN_NAME = 'mycol';
get_key_to_one_column_map('myKeySpace', 'myColFamily', 'mySuperCol', QUORUM, \%map);

sub get_key_to_one_column_map
{
    my ($keyspace, $column_family_name, $super_column_name, $consistency_level, $returned_keys) = @_;
    

    my($socket, $transport, $protocol, $client, $result, $predicate, $column_parent, $keyrange);
    
    $column_parent = new Cassandra::ColumnParent();
    $column_parent->{'column_family'} = $column_family_name;
    $column_parent->{'super_column'} = $super_column_name;

    $keyrange = new Cassandra::KeyRange({ 
            'start_key' => '', 'end_key' => '', 'count' => 10 
    });


    $predicate = new Cassandra::SlicePredicate();
    $predicate->{'column_names'} = [$WANTED_COLUMN_NAME];
    
    eval 
    {
        $socket = new Thrift::Socket($CASSANDRA_HOST, $CASSANDRA_PORT);
        $transport = new Thrift::BufferedTransport($socket, 1024, 1024);
        $protocol = new Thrift::BinaryProtocol($transport);
        $client = new Cassandra::CassandraClient($protocol);
        $transport->open();
    
    
        my($next_start_key, $one_res, $iteration, $have_more, $value, $local_count, $previous_start_key);
    
        $iteration = 0;
        $have_more = 1;
        while ($have_more == 1)
        {
            $iteration++;
            $result = undef;
    
            $result = $client->get_range_slices($keyspace, $column_parent, $predicate, $keyrange, $consistency_level);
            
            # on success, results is an array of objects.
    
            if (scalar(@$result) == 1)
            {
                # we only got 1 result... check to see if it's the
                # same key as the start key... if so, we're done.
                if ($result->[0]->{'key'} eq $keyrange->{'start_key'})
                {
                    $have_more = 0;
                    last;
                }
            }
            
            # check to see if we are starting with some value
            # if so, we throw away the first result.
            if ($keyrange->{'start_key'})
            {
                shift(@$result);
            }
            if (scalar(@$result) == 0)
            {
                $have_more = 0;
                last;
            }
    
            $previous_start_key = $keyrange->{'start_key'};
            $local_count = 0;

            for (my $r = 0; $r < scalar(@$result); $r++)
            {
                $one_res = $result->[$r];
                $next_start_key = $one_res->{'key'};
                
                $keyrange->{'start_key'} = $next_start_key;

                if (!exists($returned_keys->{$next_start_key}))
                {
                    $have_more = 1;
                    $local_count++;
                }
                
                
                next if (scalar(@{ $one_res->{'columns'} }) == 0);
    
                $value = undef;
                
                for (my $i = 0; $i < scalar(@{ $one_res->{'columns'} }); $i++)
                {
                    if ($one_res->{'columns'}->[$i]->{'column'}->{'name'} eq $WANTED_COLUMN_NAME)
                    {
                        $value = $one_res->{'columns'}->[$i]->{'column'}->{'value'};
                        if (!exists($returned_keys->{$next_start_key}))
                        {
                            $returned_keys->{$next_start_key} = $value;
                        }
                        else
                        {
                            # NOTE: prior to Cassandra 0.6.4, the get_range_slices returns duplicates sometimes.
                            #warn "Found second value for key [$next_start_key]  was [" . $returned_keys->{$next_start_key} . "] now [$value]!";
                        }
                    }
                }
                $have_more = 1;
            } # end results loop
            
            if ($keyrange->{'start_key'} eq $previous_start_key)
            {
                $have_more = 0;
            }
            
        } # end while() loop
    
        $transport->close();
    };
    if ($@)
    {
        warn "Problem with Cassandra: " . Dumper($@);
    }
    
    # cleanup
    undef $client;
    undef $protocol;
    undef $transport;
    undef $socket;
}


HTH
Dave Viner

On Fri, Aug 6, 2010 at 7:45 AM, Adam Crain <adam.crain@greenenergycorp.com> wrote:
Thomas,

That was indeed the source of the problem. I naively assumed that the token range would help me avoid retrieving duplicate rows.

If you iterate over the keys, how do you avoid retrieving duplicate keys? I tried this morning and I seem to get odd results. Maybe this is just a consequence of the random partitioner. I really don't care about the order of the iteration, but only each key once and that I see all keys is important.

-Adam


-----Original Message-----
From: th.heller@gmail.com on behalf of Thomas Heller
Sent: Fri 8/6/2010 7:27 AM
To: user@cassandra.apache.org
Subject: Re: error using get_range_slice with random partitioner

Wild guess here, but are you using start_token/end_token here when you
should be using start_key? Looks to me like you are trying end_token
= ''.

HTH,
/thomas

On Thursday, August 5, 2010, Adam Crain <adam.crain@greenenergycorp.com> wrote:
> Hi,
>
> I'm on 0.6.4. Previous tickets in the JIRA in searching the web indicated that iterating over the keys in keyspace is possible, even with the random partitioner. This is mostly desirable in my case for testing purposes only.
>
> I get the following error:
>
> [junit] Internal error processing get_range_slices
> [junit] org.apache.thrift.TApplicationException: Internal error processing get_range_slices
>
> and the following server traceback:
>
> java.lang.NumberFormatException: Zero length BigInteger
>         at java.math.BigInteger.<init>(BigInteger.java:295)
>         at java.math.BigInteger.<init>(BigInteger.java:467)
>         at org.apache.cassandra.dht.RandomPartitioner$1.fromString(RandomPartitioner.java:100)
>         at org.apache.cassandra.thrift.CassandraServer.getRangeSlicesInternal(CassandraServer.java:575)
>
> I am using the scala cascal client, but am sure that get_range_slice is being called with start and stop set to "".
>
> 1) Is batch iteration possible with random partioner?
>
> This isn't clear from the FAQ entry on the subject:
>
> http://wiki.apache.org/cassandra/FAQ#iter_world
>
> 2) The FAQ states that start argument should be "". What should the end argument be?
>
> thanks!
> Adam
>
>
>
>
>
>