Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B437AC0A6 for ; Tue, 15 May 2012 22:11:32 +0000 (UTC) Received: (qmail 20684 invoked by uid 500); 15 May 2012 22:11:32 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 20642 invoked by uid 500); 15 May 2012 22:11:32 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 20633 invoked by uid 99); 15 May 2012 22:11:32 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 May 2012 22:11:32 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 May 2012 22:11:29 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id B994650AA for ; Tue, 15 May 2012 22:11:08 +0000 (UTC) Date: Tue, 15 May 2012 22:11:08 +0000 (UTC) From: "Aaron Morton (JIRA)" To: commits@cassandra.apache.org Message-ID: <406520401.1552.1337119868761.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1303648213.977.1337057913491.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (CASSANDRA-4245) Provide a UT8Type (case insensitive) comparator MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-4245?page=3Dcom.atlas= sian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D= 13276274#comment-13276274 ]=20 Aaron Morton commented on CASSANDRA-4245: ----------------------------------------- Was thinking about the impact of case insensitive comparisons. Say we have the values: aaron, Aaron, AARON, =C3=84aron, BOB and bob. Using= a Case Insensitive, Accent Sensitive collation the order should be (am usi= ng bytes as a secondary ordering, and guessing =C3=84 occurs after the non = accented A): 1. AARON, Aaron, aaron 2. =C3=84aron 3. Bob, bob We need to decide if the collation above results in three or six columns in= Cassandra.=20 Some examples of where the comparison is used: * When writing the sorted memtable we are not concerned with equality, onl= y relative ordering which is: AARON, Aaron, aaron, =C3=84aron, Bob, bob=20 * When apply a mutation to a CF we are concerned with equality, relative or= dering is not important. The six columns should be treated as six unique va= lues, or as three columns.=20 * When resolving a query we are concerned with equality and relative orderi= ng, but the equality is different to the examples above. We need to know th= at the three non accented Aaron's are equal, and that Bobs occur later.=20 If three columns writing "AARON" then "aaron" then reading "aaron" may resu= lt in "AARON" being returned. When reducing columns in a slice we need a de= terministic way to select the column name to use in the response. And / or = we the response digest needs to be calculated differently. =20 =20 If six columns comparators need to support a "unique ordering" that is used= in memtables and sstables, and a "query ordering" used when slicing. In th= e example query ordering results in 3 unique values, unique ordering result= s in 6. =20 I _think_ 3 columns is what we want. Thoughts ?=20 wrt the configuration, collation could be a CF level configuration used by = comparators that support it. Per column collation would only be used by sec= ondary indexing and seems a little overkill.=20 =20 > Provide a UT8Type (case insensitive) comparator > ----------------------------------------------- > > Key: CASSANDRA-4245 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4245 > Project: Cassandra > Issue Type: New Feature > Reporter: Ertio Lew > Priority: Minor > > It is a common use case to use a bunch of entity names as column names & = then use the row as a search index, using search by range. For such use cas= es & others, it is useful to have a UTF8 comparator that provides case inse= nsitive ordering of columns. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs: https://issues.apache.org/jira/secure/ContactAdministrators!default.jsp= a For more information on JIRA, see: http://www.atlassian.com/software/jira