cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piotr Kołaczkowski (JIRA) <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-4983) Improve range wrap-around in CFIF: CFIF shouldn't produce input splits of very tiny size
Date Thu, 22 Nov 2012 09:26:57 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Piotr Kołaczkowski updated CASSANDRA-4983:
------------------------------------------

    Attachment: 0001-CASSANDRA-4983-CFRR-able-to-iterate-over-more-than-o.patch
    
> Improve range wrap-around in CFIF: CFIF shouldn't produce input splits of very tiny size
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4983
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4983
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 1.1.6
>            Reporter: Piotr Kołaczkowski
>            Assignee: Piotr Kołaczkowski
>            Priority: Minor
>         Attachments: 0001-CASSANDRA-4983-CFRR-able-to-iterate-over-more-than-o.patch
>
>
> Currently CFIF splits the wrap-around split into two non-wrap-around splits. While it
simplifies CFRR implementation, this approach has several minor downsides:
>  * One of the splits can be extremely small. One of our (picky) customers suspected there
must be a bug, because one of his map tasks executed in 1  second, while all the rest executed
in minutes. Also having a very small task is wasting resources - more resources go to launching
the task than doing any real work.
>  * The number of map tasks is always one more than the number of (expected rows / cassandra.input.split.size).
The number of map tasks is always >= 2. This is confusing customers. 
>  * Progress reporting for the divided split parts is inaccurate - even if the splits
are similar in size, the progress bar goes to about 50% and then immediately to 100%, because
it is impossible to estimate their size properly (the size estimation is done before removing
wrap-around).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message