Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 51317 invoked from network); 6 Aug 2010 15:40:31 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 6 Aug 2010 15:40:31 -0000 Received: (qmail 56107 invoked by uid 500); 6 Aug 2010 15:40:29 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 56077 invoked by uid 500); 6 Aug 2010 15:40:28 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 56069 invoked by uid 99); 6 Aug 2010 15:40:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Aug 2010 15:40:28 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jeremy.hanna1234@gmail.com designates 209.85.213.172 as permitted sender) Received: from [209.85.213.172] (HELO mail-yx0-f172.google.com) (209.85.213.172) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Aug 2010 15:40:20 +0000 Received: by yxj4 with SMTP id 4so3441218yxj.31 for ; Fri, 06 Aug 2010 08:39:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:content-type:mime-version :subject:from:in-reply-to:date:content-transfer-encoding:message-id :references:to:x-mailer; bh=CU4+jGrVGjbDixdeX6kAcik6JKBTaaD/fYSJ9h70ync=; b=O0S5DpP4whbo8/0TsEnPE2TTbGNReMsW/R6KFTBBXDFohD/zpOLk/bqZTuOGV/zTlU WoEca3vP2JIbOfxdij6FHVPyaXRZcYasL+QAlN0KZ6cVvrUFm0vsKD/q03l4vQhR7JhY 884+wgVu1CG4TKuQtXy3uToT525RAQG+MQj2g= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; b=mGslpflZrkForZcBrQg+B2dg5B1TSr2y99NAv/+yd2Kb+lP7/4JRJOykl5doWuc0JH 8PAnYsGPcaaui2fsOcif/AvOx0KnwTWL4Pleo8SK20ESMtHPKWs9oRPSJKCDeEtQGxc9 NVQtGH9OYchQ8+4p9Qk43nRGnGkywojdK+Z24= Received: by 10.100.121.11 with SMTP id t11mr14016125anc.48.1281109199509; Fri, 06 Aug 2010 08:39:59 -0700 (PDT) Received: from [10.1.229.187] ([64.39.5.119]) by mx.google.com with ESMTPS id a12sm2401388and.36.2010.08.06.08.39.57 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 06 Aug 2010 08:39:58 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1081) Subject: Re: error using get_range_slice with random partitioner From: Jeremy Hanna In-Reply-To: Date: Fri, 6 Aug 2010 10:39:54 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <66B95D78-EF9A-46DF-AC71-0FF56A1BF78C@gmail.com> References: <0DA8807C8FFE459E88F34B3C58255E53@PSIINC.local> <39CD52E67799437D9FEAB6EF2F716EF7@PSIINC.local> To: user@cassandra.apache.org X-Mailer: Apple Mail (2.1081) X-Virus-Checked: Checked by ClamAV on apache.org Sounds like what you're seeing is in the client, but there was another = duplicate bug with get_range_slice that was recently fixed on = cassandra-0.6 branch. It's slated for 0.6.5 which will probably be out = sometime this month, based on previous minor releases. https://issues.apache.org/jira/browse/CASSANDRA-1145 On Aug 6, 2010, at 10:29 AM, Adam Crain wrote: > Thanks Dave. I'm using 0.6.4 since I say this issue in the JIRA, but I = just discovered that the client I'm using mutates the order of keys = after retrieving the result with the thrift API... pretty much making = key iteration impossible. So time to fork and see if they'll fix it :(. >=20 > I'll review yours as soon as I get the client fixed that I'm using. >=20 > Adam >=20 >=20 > -----Original Message----- > From: daveviner@gmail.com on behalf of Dave Viner > Sent: Fri 8/6/2010 11:28 AM > To: user@cassandra.apache.org > Subject: Re: error using get_range_slice with random partitioner >=20 > Funny you should ask... I just went through the same exercise. >=20 > You must use Cassandra 0.6.4. Otherwise you will get duplicate keys. > However, here is a snippet of perl that you can use. >=20 > our $WANTED_COLUMN_NAME =3D 'mycol'; > get_key_to_one_column_map('myKeySpace', 'myColFamily', 'mySuperCol', = QUORUM, > \%map); >=20 > sub get_key_to_one_column_map > { > my ($keyspace, $column_family_name, $super_column_name, > $consistency_level, $returned_keys) =3D @_; >=20 >=20 > my($socket, $transport, $protocol, $client, $result, $predicate, > $column_parent, $keyrange); >=20 > $column_parent =3D new Cassandra::ColumnParent(); > $column_parent->{'column_family'} =3D $column_family_name; > $column_parent->{'super_column'} =3D $super_column_name; >=20 > $keyrange =3D new Cassandra::KeyRange({ > 'start_key' =3D> '', 'end_key' =3D> '', 'count' =3D> 10 > }); >=20 >=20 > $predicate =3D new Cassandra::SlicePredicate(); > $predicate->{'column_names'} =3D [$WANTED_COLUMN_NAME]; >=20 > eval > { > $socket =3D new Thrift::Socket($CASSANDRA_HOST, = $CASSANDRA_PORT); > $transport =3D new Thrift::BufferedTransport($socket, 1024, = 1024); > $protocol =3D new Thrift::BinaryProtocol($transport); > $client =3D new Cassandra::CassandraClient($protocol); > $transport->open(); >=20 >=20 > my($next_start_key, $one_res, $iteration, $have_more, $value, > $local_count, $previous_start_key); >=20 > $iteration =3D 0; > $have_more =3D 1; > while ($have_more =3D=3D 1) > { > $iteration++; > $result =3D undef; >=20 > $result =3D $client->get_range_slices($keyspace, = $column_parent, > $predicate, $keyrange, $consistency_level); >=20 > # on success, results is an array of objects. >=20 > if (scalar(@$result) =3D=3D 1) > { > # we only got 1 result... check to see if it's the > # same key as the start key... if so, we're done. > if ($result->[0]->{'key'} eq $keyrange->{'start_key'}) > { > $have_more =3D 0; > last; > } > } >=20 > # check to see if we are starting with some value > # if so, we throw away the first result. > if ($keyrange->{'start_key'}) > { > shift(@$result); > } > if (scalar(@$result) =3D=3D 0) > { > $have_more =3D 0; > last; > } >=20 > $previous_start_key =3D $keyrange->{'start_key'}; > $local_count =3D 0; >=20 > for (my $r =3D 0; $r < scalar(@$result); $r++) > { > $one_res =3D $result->[$r]; > $next_start_key =3D $one_res->{'key'}; >=20 > $keyrange->{'start_key'} =3D $next_start_key; >=20 > if (!exists($returned_keys->{$next_start_key})) > { > $have_more =3D 1; > $local_count++; > } >=20 >=20 > next if (scalar(@{ $one_res->{'columns'} }) =3D=3D 0); >=20 > $value =3D undef; >=20 > for (my $i =3D 0; $i < scalar(@{ $one_res->{'columns'} = }); > $i++) > { > if = ($one_res->{'columns'}->[$i]->{'column'}->{'name'} eq > $WANTED_COLUMN_NAME) > { > $value =3D > $one_res->{'columns'}->[$i]->{'column'}->{'value'}; > if (!exists($returned_keys->{$next_start_key})) > { > $returned_keys->{$next_start_key} =3D = $value; > } > else > { > # NOTE: prior to Cassandra 0.6.4, the > get_range_slices returns duplicates sometimes. > #warn "Found second value for key > [$next_start_key] was [" . $returned_keys->{$next_start_key} . "] now > [$value]!"; > } > } > } > $have_more =3D 1; > } # end results loop >=20 > if ($keyrange->{'start_key'} eq $previous_start_key) > { > $have_more =3D 0; > } >=20 > } # end while() loop >=20 > $transport->close(); > }; > if ($@) > { > warn "Problem with Cassandra: " . Dumper($@); > } >=20 > # cleanup > undef $client; > undef $protocol; > undef $transport; > undef $socket; > } >=20 >=20 > HTH > Dave Viner >=20 > On Fri, Aug 6, 2010 at 7:45 AM, Adam Crain > wrote: >=20 >> Thomas, >>=20 >> That was indeed the source of the problem. I naively assumed that the = token >> range would help me avoid retrieving duplicate rows. >>=20 >> If you iterate over the keys, how do you avoid retrieving duplicate = keys? I >> tried this morning and I seem to get odd results. Maybe this is just = a >> consequence of the random partitioner. I really don't care about the = order >> of the iteration, but only each key once and that I see all keys is >> important. >>=20 >> -Adam >>=20 >>=20 >> -----Original Message----- >> From: th.heller@gmail.com on behalf of Thomas Heller >> Sent: Fri 8/6/2010 7:27 AM >> To: user@cassandra.apache.org >> Subject: Re: error using get_range_slice with random partitioner >>=20 >> Wild guess here, but are you using start_token/end_token here when = you >> should be using start_key? Looks to me like you are trying end_token >> =3D ''. >>=20 >> HTH, >> /thomas >>=20 >> On Thursday, August 5, 2010, Adam Crain = >> wrote: >>> Hi, >>>=20 >>> I'm on 0.6.4. Previous tickets in the JIRA in searching the web = indicated >> that iterating over the keys in keyspace is possible, even with the = random >> partitioner. This is mostly desirable in my case for testing purposes = only. >>>=20 >>> I get the following error: >>>=20 >>> [junit] Internal error processing get_range_slices >>> [junit] org.apache.thrift.TApplicationException: Internal error >> processing get_range_slices >>>=20 >>> and the following server traceback: >>>=20 >>> java.lang.NumberFormatException: Zero length BigInteger >>> at java.math.BigInteger.(BigInteger.java:295) >>> at java.math.BigInteger.(BigInteger.java:467) >>> at >> = org.apache.cassandra.dht.RandomPartitioner$1.fromString(RandomPartitioner.= java:100) >>> at >> = org.apache.cassandra.thrift.CassandraServer.getRangeSlicesInternal(Cassand= raServer.java:575) >>>=20 >>> I am using the scala cascal client, but am sure that get_range_slice = is >> being called with start and stop set to "". >>>=20 >>> 1) Is batch iteration possible with random partioner? >>>=20 >>> This isn't clear from the FAQ entry on the subject: >>>=20 >>> http://wiki.apache.org/cassandra/FAQ#iter_world >>>=20 >>> 2) The FAQ states that start argument should be "". What should the = end >> argument be? >>>=20 >>> thanks! >>> Adam >>>=20 >>>=20 >>>=20 >>>=20 >>>=20 >>>=20 >>=20 >>=20 >>=20 >>=20 >>=20 >=20 >