incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Crain" <adam.cr...@greenenergycorp.com>
Subject RE: error using get_range_slice with random partitioner
Date Fri, 06 Aug 2010 19:57:11 GMT
Hi Jeremy,

So, I fixed my client so it preserves the ordering and I get results that may be related to
the bug.

If I insert 30 keys into the random partitioner with names [key1, key2, ... key30] and then
start the iteration (with a batch size of 10) I get the following debug output during the
iteration:

[junit] Query w/ Range(,,10) result size: 10
[junit] key18
[junit] key23
[junit] key26
[junit] key27
[junit] key12
[junit] key28
[junit] key4
[junit] key3
[junit] key1
[junit] key24
[junit] Query w/ Range(key24,,10) result size: 10
[junit] key24
[junit] key5
[junit] key17
[junit] key29
[junit] key19
[junit] key8
[junit] key15
[junit] key22
[junit] key6
[junit] key25
[junit] Query w/ Range(key25,,10) result size: 3
[junit] key25
[junit] key14
[junit] key2
[junit] Query w/ Range(key2,,10), result size: 1
[junit] key2

I never make it back around to key 18 as expected, and I never see all of the keys.

-Adam

-----Original Message-----
From: Jeremy Hanna [mailto:jeremy.hanna1234@gmail.com]
Sent: Fri 8/6/2010 11:45 AM
To: user@cassandra.apache.org
Subject: Re: error using get_range_slice with random partitioner
 
Sounds like what you're seeing is in the client, but there was another duplicate bug with
get_range_slice that was recently fixed on cassandra-0.6 branch.  It's slated for 0.6.5 which
will probably be out sometime this month, based on previous minor releases.

https://issues.apache.org/jira/browse/CASSANDRA-1145

On Aug 6, 2010, at 10:29 AM, Adam Crain wrote:

> Thanks Dave. I'm using 0.6.4 since I say this issue in the JIRA, but I just discovered
that the client I'm using mutates the order of keys after retrieving the result with the thrift
API... pretty much making key iteration impossible. So time to fork and see if they'll fix
it :(.
> 
> I'll review yours as soon as I get the client fixed that I'm using.
> 
> Adam
> 
> 
> -----Original Message-----
> From: daveviner@gmail.com on behalf of Dave Viner
> Sent: Fri 8/6/2010 11:28 AM
> To: user@cassandra.apache.org
> Subject: Re: error using get_range_slice with random partitioner
> 
> Funny you should ask... I just went through the same exercise.
> 
> You must use Cassandra 0.6.4.  Otherwise you will get duplicate keys.
> However, here is a snippet of perl that you can use.
> 
> our $WANTED_COLUMN_NAME = 'mycol';
> get_key_to_one_column_map('myKeySpace', 'myColFamily', 'mySuperCol', QUORUM,
> \%map);
> 
> sub get_key_to_one_column_map
> {
>    my ($keyspace, $column_family_name, $super_column_name,
> $consistency_level, $returned_keys) = @_;
> 
> 
>    my($socket, $transport, $protocol, $client, $result, $predicate,
> $column_parent, $keyrange);
> 
>    $column_parent = new Cassandra::ColumnParent();
>    $column_parent->{'column_family'} = $column_family_name;
>    $column_parent->{'super_column'} = $super_column_name;
> 
>    $keyrange = new Cassandra::KeyRange({
>            'start_key' => '', 'end_key' => '', 'count' => 10
>    });
> 
> 
>    $predicate = new Cassandra::SlicePredicate();
>    $predicate->{'column_names'} = [$WANTED_COLUMN_NAME];
> 
>    eval
>    {
>        $socket = new Thrift::Socket($CASSANDRA_HOST, $CASSANDRA_PORT);
>        $transport = new Thrift::BufferedTransport($socket, 1024, 1024);
>        $protocol = new Thrift::BinaryProtocol($transport);
>        $client = new Cassandra::CassandraClient($protocol);
>        $transport->open();
> 
> 
>        my($next_start_key, $one_res, $iteration, $have_more, $value,
> $local_count, $previous_start_key);
> 
>        $iteration = 0;
>        $have_more = 1;
>        while ($have_more == 1)
>        {
>            $iteration++;
>            $result = undef;
> 
>            $result = $client->get_range_slices($keyspace, $column_parent,
> $predicate, $keyrange, $consistency_level);
> 
>            # on success, results is an array of objects.
> 
>            if (scalar(@$result) == 1)
>            {
>                # we only got 1 result... check to see if it's the
>                # same key as the start key... if so, we're done.
>                if ($result->[0]->{'key'} eq $keyrange->{'start_key'})
>                {
>                    $have_more = 0;
>                    last;
>                }
>            }
> 
>            # check to see if we are starting with some value
>            # if so, we throw away the first result.
>            if ($keyrange->{'start_key'})
>            {
>                shift(@$result);
>            }
>            if (scalar(@$result) == 0)
>            {
>                $have_more = 0;
>                last;
>            }
> 
>            $previous_start_key = $keyrange->{'start_key'};
>            $local_count = 0;
> 
>            for (my $r = 0; $r < scalar(@$result); $r++)
>            {
>                $one_res = $result->[$r];
>                $next_start_key = $one_res->{'key'};
> 
>                $keyrange->{'start_key'} = $next_start_key;
> 
>                if (!exists($returned_keys->{$next_start_key}))
>                {
>                    $have_more = 1;
>                    $local_count++;
>                }
> 
> 
>                next if (scalar(@{ $one_res->{'columns'} }) == 0);
> 
>                $value = undef;
> 
>                for (my $i = 0; $i < scalar(@{ $one_res->{'columns'} });
> $i++)
>                {
>                    if ($one_res->{'columns'}->[$i]->{'column'}->{'name'}
eq
> $WANTED_COLUMN_NAME)
>                    {
>                        $value =
> $one_res->{'columns'}->[$i]->{'column'}->{'value'};
>                        if (!exists($returned_keys->{$next_start_key}))
>                        {
>                            $returned_keys->{$next_start_key} = $value;
>                        }
>                        else
>                        {
>                            # NOTE: prior to Cassandra 0.6.4, the
> get_range_slices returns duplicates sometimes.
>                            #warn "Found second value for key
> [$next_start_key]  was [" . $returned_keys->{$next_start_key} . "] now
> [$value]!";
>                        }
>                    }
>                }
>                $have_more = 1;
>            } # end results loop
> 
>            if ($keyrange->{'start_key'} eq $previous_start_key)
>            {
>                $have_more = 0;
>            }
> 
>        } # end while() loop
> 
>        $transport->close();
>    };
>    if ($@)
>    {
>        warn "Problem with Cassandra: " . Dumper($@);
>    }
> 
>    # cleanup
>    undef $client;
>    undef $protocol;
>    undef $transport;
>    undef $socket;
> }
> 
> 
> HTH
> Dave Viner
> 
> On Fri, Aug 6, 2010 at 7:45 AM, Adam Crain
> <adam.crain@greenenergycorp.com>wrote:
> 
>> Thomas,
>> 
>> That was indeed the source of the problem. I naively assumed that the token
>> range would help me avoid retrieving duplicate rows.
>> 
>> If you iterate over the keys, how do you avoid retrieving duplicate keys? I
>> tried this morning and I seem to get odd results. Maybe this is just a
>> consequence of the random partitioner. I really don't care about the order
>> of the iteration, but only each key once and that I see all keys is
>> important.
>> 
>> -Adam
>> 
>> 
>> -----Original Message-----
>> From: th.heller@gmail.com on behalf of Thomas Heller
>> Sent: Fri 8/6/2010 7:27 AM
>> To: user@cassandra.apache.org
>> Subject: Re: error using get_range_slice with random partitioner
>> 
>> Wild guess here, but are you using start_token/end_token here when you
>> should be using start_key? Looks to me like you are trying end_token
>> = ''.
>> 
>> HTH,
>> /thomas
>> 
>> On Thursday, August 5, 2010, Adam Crain <adam.crain@greenenergycorp.com>
>> wrote:
>>> Hi,
>>> 
>>> I'm on 0.6.4. Previous tickets in the JIRA in searching the web indicated
>> that iterating over the keys in keyspace is possible, even with the random
>> partitioner. This is mostly desirable in my case for testing purposes only.
>>> 
>>> I get the following error:
>>> 
>>> [junit] Internal error processing get_range_slices
>>> [junit] org.apache.thrift.TApplicationException: Internal error
>> processing get_range_slices
>>> 
>>> and the following server traceback:
>>> 
>>> java.lang.NumberFormatException: Zero length BigInteger
>>>        at java.math.BigInteger.<init>(BigInteger.java:295)
>>>        at java.math.BigInteger.<init>(BigInteger.java:467)
>>>        at
>> org.apache.cassandra.dht.RandomPartitioner$1.fromString(RandomPartitioner.java:100)
>>>        at
>> org.apache.cassandra.thrift.CassandraServer.getRangeSlicesInternal(CassandraServer.java:575)
>>> 
>>> I am using the scala cascal client, but am sure that get_range_slice is
>> being called with start and stop set to "".
>>> 
>>> 1) Is batch iteration possible with random partioner?
>>> 
>>> This isn't clear from the FAQ entry on the subject:
>>> 
>>> http://wiki.apache.org/cassandra/FAQ#iter_world
>>> 
>>> 2) The FAQ states that start argument should be "". What should the end
>> argument be?
>>> 
>>> thanks!
>>> Adam
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> 
>> 
> 
> <winmail.dat>






Mime
View raw message