cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rick Branson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5569) Every stream operation requires checking indexes in every SSTable
Date Wed, 15 May 2013 15:37:15 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658468#comment-13658468
] 

Rick Branson commented on CASSANDRA-5569:
-----------------------------------------

What I'm working with now is adding a second method signature with a "flush" boolean that
allows the behavior to be turned off for StreamingRepairTask.
                
> Every stream operation requires checking indexes in every SSTable
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-5569
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5569
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.2.0
>            Reporter: Rick Branson
>            Assignee: Rick Branson
>            Priority: Minor
>              Labels: streaming
>             Fix For: 1.2.5
>
>         Attachments: 5569.txt, 5569-v2.txt
>
>
> It looks like there's a streaming performance issue when leveled compaction and vnodes
get together. To get the candidate set of chunks to stream, the streaming system gets references
to every SSTable for a CF. This is probably a perfectly reasonable assumption for non-vnode
cases, because the data being streamed is likely distributed across the full SSTable set.
This is also probably a perfectly reasonable assumption for size-tiered compaction, because
the data is, again, likely distributed across the full SSTable set. However, for each vnode
repair performed on LCS CF's, this scan across potentially tens of thousands of SSTables is
wasteful considering that only a small percentage of them will actually have data for a given
range.
> This manifested itself as "hanging" repair operations with tasks backing up on the MiscStage
thread pool.
> The attached patch changes the streaming code so that for a given range, only SSTables
for the requested range are checked to be included in streaming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message