Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 95293 invoked from network); 18 Feb 2011 05:10:00 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 18 Feb 2011 05:10:00 -0000 Received: (qmail 25215 invoked by uid 500); 18 Feb 2011 05:09:59 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 24951 invoked by uid 500); 18 Feb 2011 05:09:55 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 24943 invoked by uid 99); 18 Feb 2011 05:09:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Feb 2011 05:09:53 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of kamioshot@gmail.com designates 74.125.82.172 as permitted sender) Received: from [74.125.82.172] (HELO mail-wy0-f172.google.com) (74.125.82.172) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Feb 2011 05:09:46 +0000 Received: by wyf23 with SMTP id 23so3496919wyf.31 for ; Thu, 17 Feb 2011 21:09:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type:content-transfer-encoding; bh=+EOYQ2ziKHoNBPRk5rEfu3g0234CbPgcvoN78IVIKGQ=; b=rbrVknAdq+E+Q9SSvAByV9GVyibdOAwi/lalZZQqHj8nf5ys+p7FhliRChXrNAaHIo 9cr26Bq2crLDuvuMYCkiNN7OnWuOwfGWkbTpRrpVdHA6fRV95M4HdO5nlOtx61ME6leR ChvCQNdiCenNydwiuMTPlmQYnj+oeYPAiTCnw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=gRvprCjf4Wa7/mwYqrPVP17D8qbikAcbIWqi3KRAQvqbx+ccVXmJd/AZWy9ep8Z3bc XLhIS3CgCG6chaIOQ80zlwJXzDM4Yq2zf0gXJth8DPSWm9Qyh9YxvR5RA9h7fEQMK/YF gAfTNFVcCKEFSAFPll7tup6Ys4+DrQxyMU/xE= MIME-Version: 1.0 Received: by 10.216.51.130 with SMTP id b2mr228608wec.42.1298005766373; Thu, 17 Feb 2011 21:09:26 -0800 (PST) Received: by 10.216.82.199 with HTTP; Thu, 17 Feb 2011 21:09:26 -0800 (PST) In-Reply-To: References: Date: Fri, 18 Feb 2011 14:09:26 +0900 Message-ID: Subject: Re: Inconsistent result in super range slice query (reversed order) From: Shotaro Kamio To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Hi Aaron, Range slice means get_range_slices() in thrift api, createSuperSliceQuery in hector, get_range() in pycassa. The example code in pycassa is attached below. The problem is a little bit complicated to explain. I'll try to describe in examples. Here are 8 super column names which exist in the specific key. The list is forward order. #0: "20031210020333/190209-20031210-4476807-s/" #1: "20031210020333/190209-20031210-4476807-s/0" #2: "20031210021940/190209-20031210-4476883-s/" #3: "20031210021940/190209-20031210-4476883-s/0" #4: "20031210022059/190209-20031210-4476885-s/" #5: "20031210022059/190209-20031210-4476885-s/0" <-- Problem around here. #6: "20031210022154/190209-20031210-4476888-s/" #7: "20031210022154/190209-20031210-4476888-s/0" There is no problem if I use the super column names exist on the key. * Range from #0 to #3 in forward order -> OK * Range from #0 to #5 in forward order -> OK * Range from #0 to #7 in forward order -> OK * Range from #7 to #0 in reverse order -> OK * Range from #5 to #0 in reverse order -> OK * Range from #3 to #0 in reverse order -> OK Because I want to scan orders in a certain range, however, I use column names which added character "z" (higher than anything in order_id). Those column names are listed below as #1z, #3z, #5z and #7z. Note that these super column names don't really exist on the key. (#4+ is a column name to locate between #4 and #5) #0 : "20031210020333/190209-20031210-4476807-s/" #1 : "20031210020333/190209-20031210-4476807-s/0" #1z: "20031210020333/190209-20031210-4476807-s/z" (don't exist) #2 : "20031210021940/190209-20031210-4476883-s/" #3 : "20031210021940/190209-20031210-4476883-s/0" #3z: "20031210021940/190209-20031210-4476883-s/z" (don't exist) #4 : "20031210022059/190209-20031210-4476885-s/" #4+: "20031210022059/190209-20031210-4476885-s/+" (don't exist) #5 : "20031210022059/190209-20031210-4476885-s/0" <-- Problem around here. #5z: "20031210022059/190209-20031210-4476885-s/z" (don't exist) #6 : "20031210022154/190209-20031210-4476888-s/" #7 : "20031210022154/190209-20031210-4476888-s/0" #7z: "20031210022154/190209-20031210-4476888-s/z" (don't exist) Then, try to range slice them. * Range from #0 to #3z in forward order -> OK * Range from #0 to #4+ in forward order -> OK * Range from #0 to #5z in forward order -> OK * Range from #0 to #7z in forward order -> OK * Range from #7z to #0 in reverse order -> OK * Range from #5z to #0 in reverse order -> FAIL (no result) * Range from #4+ to #0 in reverse order -> OK * Range from #3z to #0 in reverse order -> OK The problem happens in this case. No error or warning is shown in cassandra= log. Also, I tried dumping data into json via sstable2json and restored it with json2sstable. But the same problem occurs. The code I used for the test is something like this. ---------------------- client =3D pycassa.connect(KEYSPACE, [ CASSANDRA_HOST ]) cf =3D pycassa.ColumnFamily(client, COLUMN_FAMILY) columns =3D [ "20031210020333/190209-20031210-4476807-s/" , #0 "20031210020333/190209-20031210-4476807-s/0" , #1 "20031210021940/190209-20031210-4476883-s/" , #2 "20031210021940/190209-20031210-4476883-s/0" , #3 "20031210022059/190209-20031210-4476885-s/" , #4 "20031210022059/190209-20031210-4476885-s/0" , #5 # <--Problem_around_here. "20031210022154/190209-20031210-4476888-s/" , #6 "20031210022154/190209-20031210-4476888-s/0" #7 ] reversed =3D False if len(sys.argv) > 1: # use reversed order if "-r" option is given. "-f" or others for forward order, no option will list all column names. reversed =3D (sys.argv[1] =3D=3D '-r') start_date =3D columns[0] end_date =3D columns[7] + "z" # add "z" to make problem. if reversed: temp =3D start_date start_date =3D end_date end_date =3D temp pass else: start_date =3D end_date =3D '' pass print "start_date =3D", start_date, "end_date =3D", end_date, "reversed =3D ", reversed for it in cf.get_range(start =3D A_KEY, finish =3D A_KEY, column_reversed=3Dreversed, column_count=3D10000, column_start=3Dstart_date= , column_finish=3Dend_date): for d in it[1].iteritems(): print "col=3D'%s', len =3D %d" % (d[0], len(d[0])) pass pass ------------------------- Regards, Shotaro On Fri, Feb 18, 2011 at 5:19 AM, Aaron Morton wro= te: > First some terminology, when you say range slice do you mean getting mult= iple rows? Or do you mean get_slice where you return multiple super columns= from one row? > > Your examples looks like you want to get multiple super columns from one = row. In which case the choice of partitioner is not important. The comparat= or and sub comparator as specified in the CF definition control the orderin= g of colums. If possible i would suggest using the random partitioner. > > Could you provide examples of how you are doing the queries using pycassa= we may be able to help. > > My initial guess is that the ranges you specify for the query are not cor= rect when using ASCII ordering for column names, e,g, > > 20031210 < 20031210022059/190209-20031210-4476885-s/z is true > > 20031210022059/190209-20031210-4476885-s/z < 20031210 is not true > > Trying appending the highest value ASCII character to the end of 20031210 > > Cheers > Aaron > > On 18/02/2011, at 4:35 AM, Shotaro Kamio wrote: > >> Hi, >> >> We are in trouble with a strange behavior in cassandra 0.7.2 (also >> happened in 0.7.0). Could someone help us? >> >> The problem happens on a column family of super column type named "Order= ". >> Data structure is something like: >> =A0Order[ a_key ][ date + "/" + order_id + "/" (+ suffix) ][attribute] = =3D value >> >> For example, >> Order[ "100" ][ "20031210022059/190209-20031210-4476885-s/" ] >> is a super column. >> Because we want to scan them in the latest-first order, range slice >> query with reversed order is used. (Partitioner is >> ByteOrderedPartitioner). >> >> In some supercolumns in my cassandra instance, reversed query returns >> no result while it should have results. >> For instance, >> >> * Range slice in normal (lexical)-order ( Order[ "100" ] [ from >> "20031210" to "20031210022059/190209-20031210-4476885-s/z" ] ) will >> return results correctly. >> >> col=3D'20031210014347/190209-20031210-4476668-s/' >> col=3D'20031210014347/190209-20031210-4476668-s/0' >> col=3D'20031210022059/190209-20031210-4476885-s/' >> col=3D'20031210022059/190209-20031210-4476885-s/0' >> >> * Range slice in reversed (latest-first)-order ( Order[ "100" ] [ from >> "20031210022059/190209-20031210-4476885-s/z" to =A0"20031210" ] ) will >> return NO result! >> >> Note that the super column name >> "20031210022059/190209-20031210-4476885-s/z" doesn't exist. The query >> should work. And, it succeeds in other super columns. >> >> * Range slice in reversed (latest-first)-order starting from existing >> column name ( Order[ "100" ] [ from >> "20031210022059/190209-20031210-4476885-s/0" to "20031210" ] ) will >> return results which should return. >> >> Both pycassa and hector show the same behavior on the same column >> name. I guess that cassandra has some logical error. >> >> >> I'll appreciate any help. >> >> >> Best reagards, >> Shotaro > --=20 Shotaro Kamio