cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nirmal Ranganathan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-1189) Refactor streaming
Date Wed, 21 Jul 2010 16:13:54 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890762#action_12890762
] 

Nirmal Ranganathan commented on CASSANDRA-1189:
-----------------------------------------------

Here's some proposed changes, please comment with feedback. There are two occurrences of streaming:


Source transfers to Destination (Anti-entropy repair, node decommission, possibly bulk import)
- In each of the cases source has a list of sstable files it needs to transfer to the destination.
- Source maintains a list of all the files, source creates a session id for transferring this
set of files.
- Source streams the first file, header contains a new StreamHeader, that has the PendingFile
info embedded. 
- Destination receives the stream, it has all the info for the file, once done responds with
a StreamStatus message.
- If StreamStatus is success, Source continues with next file, if not retransfer until all
files are complete.

(Approach 1) Destination requests from Source (Anti-entropy repair, bootstrap, possibly bulk
export)
- Destination complies list of ranges and sends a StreamRequest message to Source, it attaches
a session id to keep track of the request.
- Source based on the ranges compiles a list of PendingFile's and sends a StreamRequestResponse
message with the list of files.
- Destination now has the list of files to maintain state.
- Destination sends a StreamRequest for a file from the list, it has a session id and file
descriptor info attached. 
- Source Streams the file to Destination. 
- Destination based on the transfer status, requests the next file or re-requests the same
file, until all files are transferred. 

(Approach 2) Destination requests from Source (Anti-entropy repair, bootstrap, possibly bulk
export)
- Destination complies list of ranges and sends a StreamRequest message to Source, it attaches
a session id to keep track of the request.
- Source compiles list of PendingFile's from requested ranges. Source maintains state. 
- Source Streams file 1 with attached StreamHeader.
- Destination receives file and responds with a StreamStatus. 
- Source based on status transfers the next file or re-transfers the same file. 

Changes to Protocol for File Streaming:
- Current -> | Protocol magic | Header | Body (File contents) |
- Proposed -> | Protocol magic | Header | StreamHeader size | StreamHeader | Body (File
contents) |
- The protocol for all other Message's remain the same, the format remains the same, the content
will vary.

Effects of the mentioned changes:
- There can be multiple transfers per source and destination.
- No order of files is required, prevents overlapping streams from breaking anything.
- Other services can transfer files without a problem. 
- Initiate and Initiate Done will be removed. A little cleaner process. 
- Facilitates for adding a layer on top to do bulk imports/exports.

Questions:
- The current streaming does not seem to maintain persistant state if a node fails during
streaming, would that be something that needs to be considered. 
- Do we want to add checksums?

> Refactor streaming
> ------------------
>
>                 Key: CASSANDRA-1189
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1189
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7
>            Reporter: Gary Dusbabek
>            Assignee: Nirmal Ranganathan
>            Priority: Critical
>             Fix For: 0.7
>
>
> The current architecture is buggy because it makes the assumption that only one stream
can be in process between two nodes at a given time, and stream send order never changes.
 Because of this, the ACK process gets fouled up when other services wish to stream files.
> The process is somewhat contorted too (request, initiate, initiate done, send).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message