cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Morton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-4245) Provide a UT8Type (case insensitive) comparator
Date Tue, 15 May 2012 22:11:08 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276274#comment-13276274
] 

Aaron Morton commented on CASSANDRA-4245:
-----------------------------------------

Was thinking about the impact of case insensitive comparisons.

Say we have the values: aaron, Aaron, AARON, Äaron, BOB and bob. Using a Case Insensitive,
Accent Sensitive collation the order should be (am using bytes as a secondary ordering, and
guessing Ä occurs after the non accented A):

1. AARON, Aaron, aaron
2. Äaron
3. Bob, bob

We need to decide if the collation above results in three or six columns in Cassandra. 

Some examples of where the comparison is used:
 * When writing the sorted memtable we are not concerned with equality, only relative ordering
which is: AARON, Aaron,  aaron, Äaron, Bob, bob 
* When apply a mutation to a CF we are concerned with equality, relative ordering is not important.
The six columns should be treated as six unique values, or as three columns. 
* When resolving a query we are concerned with equality and relative ordering, but the equality
is different to the examples above. We need to know that the three non accented Aaron's are
equal, and that Bobs occur later. 

If three columns writing "AARON" then "aaron" then reading "aaron" may result in "AARON" being
returned. When reducing columns in a slice we need a deterministic way to select the column
name to use in the response. And / or we the response digest needs to be calculated differently.
 
 
If six columns comparators need to support a "unique ordering" that is used in memtables and
sstables, and a "query ordering" used when slicing. In the example query ordering results
in 3 unique values, unique ordering results in 6.  

I _think_ 3 columns is what we want. Thoughts ? 

wrt the configuration, collation could be a CF level configuration used by comparators that
support it. Per column collation would only be used by secondary indexing and seems a little
overkill. 
                
> Provide a UT8Type (case insensitive) comparator
> -----------------------------------------------
>
>                 Key: CASSANDRA-4245
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4245
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Ertio Lew
>            Priority: Minor
>
> It is a common use case to use a bunch of entity names as column names & then use
the row as a search index, using search by range. For such use cases & others, it is useful
to have a UTF8 comparator that provides case insensitive ordering of columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message