cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jouni Hartikainen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-4784) Create separate sstables for each token range handled by a node
Date Sat, 09 Feb 2013 12:39:13 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-4784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13575153#comment-13575153
] 

Jouni Hartikainen commented on CASSANDRA-4784:
----------------------------------------------

I'm not really sure if I understood this correctly, but wouldn't this change lead to memtable
flushes creating much more random I/O than previously? Especially when using vnodes wouldn't
the incoming data be spread to num_tokens files per CF instead of one per CF? Wouldn't this
affect compactions as well? E.g. for default size tiered strategy, instead of compacting 4
larger SSTables into one even larger per CF, we would be compacting num_tokens * 4 smaller
files into num_tokens larger ones per CF.

Am I missing something here?
                
> Create separate sstables for each token range handled by a node
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-4784
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4784
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.2.0 beta 1
>            Reporter: sankalp kohli
>            Assignee: Benjamin Coverston
>            Priority: Minor
>              Labels: perfomance
>             Fix For: 2.0
>
>         Attachments: 4784.patch
>
>
> Currently, each sstable has data for all the ranges that node is handling. If we change
that and rather have separate sstables for each range that node is handling, it can lead to
some improvements.
> Improvements
> 1) Node rebuild will be very fast as sstables can be directly copied over to the bootstrapping
node. It will minimize any application level logic. We can directly use Linux native methods
to transfer sstables without using CPU and putting less pressure on the serving node. I think
in theory it will be the fastest way to transfer data. 
> 2) Backup can only transfer sstables for a node which belong to its primary keyrange.

> 3) ETL process can only copy one replica of data and will be much faster. 
> Changes:
> We can split the writes into multiple memtables for each range it is handling. The sstables
being flushed from these can have details of which range of data it is handling.
> There will be no change I think for any reads as they work with interleaved data anyway.
But may be we can improve there as well? 
> Complexities:
> The change does not look very complicated. I am not taking into account how it will work
when ranges are being changed for nodes. 
> Vnodes might make this work more complicated. We can also have a bit on each sstable
which says whether it is primary data or not. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message