cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mck SembWever (Issue Comment Edited) (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (CASSANDRA-3137) Implement wrapping intersections for ConfigHelper's InputKeyRange
Date Tue, 27 Sep 2011 09:20:11 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099124#comment-13099124
] 

Mck SembWever edited comment on CASSANDRA-3137 at 9/27/11 9:19 AM:
-------------------------------------------------------------------

Indeed. I could be using this asap.

The use case is...
We're using a ByteOrderedPartition because we run incremental hadoop jobs over one of our
column families where "events" initially come in. This cf has RF=1 and time-based UUID keys
that are manipulated so that their byte ordering are time ordered. (the timestamp put up front).
Each column has ttl of 3 months.
After 3 months of data we saw all data on one node. Now i understand as the token range is
the timestamp range which is from 1970 to 2270 so of course our 3 month period fell on one
node (with a 3 node cluster even 100 years would fall on one node).

To properly manage this cf we need to either continuously move nodes around, a cumbersome
operation, or change the key so it's prefixed with {{timestamp % 3months}}. This would allow
3 months of data to cycle over the whole cluster and wrap around again. Obviously we're leaning
towards the latter solution as it simplifies operations. But it does require this patch.

(When CFIF supports IndexClause everything changes, we change our cluster to RandomPartitioner,
use secondary indexes, and never look back...)
                
      was (Author: michaelsembwever):
    Indeed. I could be using this asap.

The use case is...
We're using a ByteOrderedPartition because we run incremental hadoop jobs over one of our
column families where "events" initially come in. This cf has RF=1 and time-based UUID keys
that are manipulated so that their byte ordering are time ordered. (the byte-unsigned timestamp
put up front). Each column has ttl of 3 months.
After 3 months of data we saw all data on one node. Now i understand as the token range is
the timestamp range which is from 1970 to 2270 so of course our 3 month period fell on one
node (with a 3 node cluster even 100 years would fall on one node).

To properly manage this cf we need to either continuously move nodes around, a cumbersome
operation, or change the key so it's prefixed with {{timestamp % 3months}}. This would allow
3 months of data to cycle over the whole cluster and wrap around again. Obviously we're leaning
towards the latter solution as it simplifies operations. But it does require this patch.

(When CFIF supports IndexClause everything changes, we change our cluster to RandomPartitioner,
use secondary indexes, and never look back...)
                  
> Implement wrapping intersections for ConfigHelper's InputKeyRange
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-3137
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3137
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>    Affects Versions: 0.8.5
>            Reporter: Mck SembWever
>            Assignee: Mck SembWever
>            Priority: Minor
>             Fix For: 0.8.7
>
>         Attachments: CASSANDRA-3137.patch, CASSANDRA-3137.patch
>
>
> Before there was no support for multiple intersections between the split's range and
the job's configured range.
> After CASSANDRA-3108 it is now possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message