cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yuki Morishita (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-2677) Optimize streaming to be single-pass
Date Wed, 13 Jul 2011 14:49:01 GMT


Yuki Morishita updated CASSANDRA-2677:

    Attachment: trunk-2677.txt

> Optimize streaming to be single-pass
> ------------------------------------
>                 Key: CASSANDRA-2677
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.0
>         Attachments: trunk-2677.txt
> Streaming currently is a two-pass operation: one to write the Data component do disk
from the socket, then another to build the index and bloom filter from it.  This means we
do about 2x the i/o we would if we created the index and BF during the original write.
> For node movement this was not considered to be a Big Deal because the stream target
is not a member of the ring, so we can be inefficient without hurting live queries.  But optimizing
node movement to not require un/rebootstrap (CASSANDRA-1427) and bulk load (CASSANDRA-1278)
mean we can stream to live nodes too.
> The main obstacle here is we don't know how many keys will be in the new sstable ahead
of time, which we need to size the bloom filter correctly. We can solve this by including
that information (or a close approximation) in the stream setup -- the source node can calculate
that without hitting disk from the in-memory index summary.

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message