cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeremiah Jordan (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-10495) Improve the way we do streaming with vnodes
Date Fri, 09 Oct 2015 14:39:26 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950481#comment-14950481
] 

Jeremiah Jordan edited comment on CASSANDRA-10495 at 10/9/15 2:38 PM:
----------------------------------------------------------------------

I don't think that method is going to help for the strategies that take the biggest hit when
this happens, LCS and DTCS.  For DTCS you would completely lose the splitting of partitions
across time ranges, and for LCS how do you pick what level to put the data in?  For a given
level data is already "compacted" so limiting by level wouldn't help.  And if you don't pick
a level, you lose the benefits of the "streaming keeps sstable level" optimizations that were
added.

An idea I had about this was to allow streaming to happen by sstable not by token range. 
So for a given sstable you only stream it once, but you skip token ranges in the file that
aren't owned by the receiver.  So you end up with at most the same number of files as the
starting node had, and for LCS/DTCS those files coulee stay in the same buckets/levels they
started in.


was (Author: jjordan):
I don't think that method is going to help for the strategies that take the biggest hit when
this happens, LCS and DTCS.  For DTCS you would completely lose the splitting of partitions
across time ranges, and for LCS how do you pick what level to put the data in?  For a given
level data is already "compacted" so limiting by level wouldn't help.

An idea I had about this was to allow streaming to happen by sstable not by token range. 
So for a given sstable you only stream it once, but you skip token ranges in the file that
aren't owned by the receiver.  So you end up with at most the same number of files as the
starting node had, and for LCS/DTCS those files coulee stay in the same buckets/levels they
started in.

> Improve the way we do streaming with vnodes
> -------------------------------------------
>
>                 Key: CASSANDRA-10495
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10495
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Marcus Eriksson
>             Fix For: 3.x
>
>
> Streaming with vnodes usually creates a large amount of sstables on the target node -
for example if each source node has 100 sstables and we use num_tokens = 256, the bootstrapping
(for example) node might get 100*256 sstables
> One approach could be to do an on-the-fly compaction on the source node, meaning we would
only stream out one sstable per range. Note that we will want the compaction strategy to decide
how to combine the sstables, for example LCS will not want to mix sstables from different
levels while STCS can probably just combine everything
> cc [~yukim]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message