cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Brown (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-4663) Streaming sends one file at a time serially.
Date Tue, 03 Jan 2017 20:49:58 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-4663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796141#comment-15796141
] 

Jason Brown commented on CASSANDRA-4663:
----------------------------------------

[~iksaif] Unfortunately, I believe this patch isn't going to do what you want. In the existing
code, {{connectionsPerHost}} is passed down to {{StreamCoordinator}}, where it is primarily
going to be used on the side that is transferring files https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/streaming/StreamCoordinator.java#L189.
In {{StreamCoordinator#sliceSSTableDetails()}}, we essentially round robin the sstable chunks
across the number of connections we will use. It's also referenced from {{HostStreamingData#getOrCreateNextSession()}},
but that's not going to increase the inbound sessions (see next paragraph).

Your current patch sets the {{connectionsPerHost}} at the receiver end of the stream (even
though it would be the node that is initiating the stream session). As it's a bootstrapping
node that is requesting the ranges from the peer, and given the structure and protocol of
the current stream session implementation (session and protocol are essentially sequential,
single threaded, and expect a single socket), I think you'd need to get down into session
and protocol management code (and muck with a whole lot) in order to to parallelize from the
(non-initiating) sending node. 

I'm happy to be wrong about my understanding here, so if you've tested it out and have seen
good gains, please share :).

> Streaming sends one file at a time serially. 
> ---------------------------------------------
>
>                 Key: CASSANDRA-4663
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4663
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Priority: Minor
>             Fix For: 3.x
>
>         Attachments: 0001-streaming-add-a-way-to-configure-the-number-of-conne.patch
>
>
> This is not fast enough when someone is using SSD and may be 10G link. We should try
to create multiple connections and send multiple files in parallel. 
> Current approach under utilize the link(even 1G).
> This change will improve the bootstrapping time of a node. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message