cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sidharth (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-4920) Add Collation to abstract type to provide standard sort order for Strings
Date Tue, 06 Nov 2012 16:54:12 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sidharth updated CASSANDRA-4920:
--------------------------------

    Description: 
Adding a way to sort UTF8 based on below described collation semantics can be useful. 

Use case: Say for example you have wide rows where you cannot use cassandra's standard indexes(secondary/primary
index). Lets say each column had a string value that was either one of alphanumeric or purely
numeric and you wanted an index by value. MOre specifically you want to slice range over a
bunch of column values and say "get me all the ID's associated with value ABC to XYZ ". As
usual I would index these values in a materialized views  

More specifically I create an index CF; And add these values into a CompositeType column and
SliceRange over them for the indexing to work and I dont really care weather its a alpha or
a numeric as long as its ordered by the following collation semantics as follows:
1) If the string is a numeric then it should be comparable like a numeric
2) If its a alpha then it should be comparable like a normal string. 
3) If its a alhpa-numeric then a contiguos sequence of numbers in the string should be compared
as numbers like "c10" > "c2".
4) UTF8 type strings assumed everywhere.

How this helps?:
1) You dont end up creating multiple CF for different value types. 
2) You dont have to write boiler plate to do complicated type detection and do this manually
in the application. 

  was:
Adding a way to sort UTF8 based on a standard order(collation) is very useful. Say for example
you have wide rows where you cannot use cassandra's standard indexes(secondary/primary index).
Lets say each column had a string value that was either one of alphanumeric or purely numeric.
 

Now lets say I want to index these values in a materialized views so I could look up things
by range of values (range makes sense as a standard ordering over my  alpha numeric and numeric
strings i.e. "12" < "10000").

More specifically I add these values into a CompositeType and SliceRange over them for the
index to work and I dont really care weather its a alpha or a numeric, it should be in the
order that follows collation semantics as follows:
1) If the string is a numeric then it should be comparable like a numeric
2) If its a alpha then it should be comparable like a normal string. 
3) If its a alhpa-numeric then a contiguos sequence of numbers in the string should be compared
as numbers like "c10" > "c2".
4) UTF8 type strings assumed everywhere.

    
> Add Collation to abstract type to provide standard sort order for Strings
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4920
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4920
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: API, Core
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Sidharth
>            Priority: Minor
>              Labels: cassandra
>
> Adding a way to sort UTF8 based on below described collation semantics can be useful.

> Use case: Say for example you have wide rows where you cannot use cassandra's standard
indexes(secondary/primary index). Lets say each column had a string value that was either
one of alphanumeric or purely numeric and you wanted an index by value. MOre specifically
you want to slice range over a bunch of column values and say "get me all the ID's associated
with value ABC to XYZ ". As usual I would index these values in a materialized views  
> More specifically I create an index CF; And add these values into a CompositeType column
and SliceRange over them for the indexing to work and I dont really care weather its a alpha
or a numeric as long as its ordered by the following collation semantics as follows:
> 1) If the string is a numeric then it should be comparable like a numeric
> 2) If its a alpha then it should be comparable like a normal string. 
> 3) If its a alhpa-numeric then a contiguos sequence of numbers in the string should be
compared as numbers like "c10" > "c2".
> 4) UTF8 type strings assumed everywhere.
> How this helps?:
> 1) You dont end up creating multiple CF for different value types. 
> 2) You dont have to write boiler plate to do complicated type detection and do this manually
in the application. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message