cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-5286) Streaming 2.0
Date Fri, 31 May 2013 12:54:21 GMT


Sylvain Lebresne commented on CASSANDRA-5286:

I had a quick look at the design document and I went over the initial patch quickly. The patch
is kind of big so I haven't looked at all the details tbh.  But the overall design looks very
reasonable to me, though I did noticed the following points/random suggestions :
* I don't think we should use CFPair/CFPath in new codes. We should use the cfId everywhere
instead (I mean, let's keep CFPair where it is in repair for now, but let's not use it anywhere).
* Not sure about when reporting is updated on reads. On the read side it's after every row.
But what if you have a 1GB sized rows? We'd probably want to call the reporting method at
every X bytes like we do on writes (where we call it at least every DEFAULT_CHUNK_SIZE bytes
I believe).
* I would create a StreamException (that can have few subclasses). And I would have StreamOperation
start return a Future<Void> but potentially throwing said StreamException. The reason
is that exceptions allows to carry information on what the problem was. We would also use
them in events.  Typically SessionCompleteEvent could allow to get the actual exception rather
than just a success boolean.
* I don't think we support adding new ranges to fetch/transfer once the stream operation is
started. So to enfore that with the API, I would probably create some StreamingPlan class
(a kind of builder) that would have the requestRanges and transferRanges methods, and then
its start() method would actually start the transfer (asynchronously) and return the StreamOperation,
that could directly implement Future<Void> (I'd suggest making it extend guava's AbstactFuture
for instance).
* Shouldn't progress reporting work at a lower level that the row level one on the read side?
What if you have a 1GB sized rows? Especially since on the write side it does work at the
bytes level.
* I'm not totaly sold on StreamMessageListener. Only StreamSession implements it, and between
that, StreamSession, ConnectionHandler and MessageHandlers, it's a bit too hard to follow
what is doing what due to too much inderection imo.  Just having StreamSession have a onStreamMessage()
(and a reportProgress(), which was weird in StreamMessageListener in the first place anyway
since it doesn't correspond to a message) and let it do it's internal dispatch would be simpler
* It would also feel more natural to me to onFileSend() in OutgoingMessageHandler directly
(to keep things related to writing to the socket in OutgoingMessageHandler), and to have the
StreamWriter callback some StreamSession.onFileSent() when done.
* In the same spirit of simplification, I would kill the ProgressSupport class, and have the
reader/writer call directly a
StreamSession.reportProgress(Direction in, Descriptor desc, long currentBytes, long totalBytes);
where Direction can be a simple IN/OUT enum in StreamSession. And let StreamSession decide
if it's worth firing up an event or not.
* Currently StreamOperation is not really asynchronous, right?.
* Nit: I think we could simplify sligtly SessionCompleteEvent and SessionPreparedEvent. Since
they already ship the operationInfo, we could just have them have a sessionId. You would then
get the SessionInfo doing {{event.operationInfo.sessions.get(event.sessionId)}} (provided
the latter sessions is a map).
* Nit: In StreamSession, I'd use a State enum rather than 3 booleans (prepared, ...).

bq. Still not versioned yet, but definitely we should


bq. I think we also want to support the ability to stream different versions of SSTable

I also think that would be nice, but I'm happy leaving that to a follow up ticket honestly.
As long as we ship the sstable version number, we can add that later on.
> Streaming 2.0
> -------------
>                 Key: CASSANDRA-5286
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Yuki Morishita
>              Labels: streaming
>             Fix For: 2.0
> 2.0 is the good time to redesign streaming API including protocol to make streaming more
performant and reliable.
> Design goals that come up in my mind:
> *Better performance*
>   - Protocol optimization
>   - Stream multiple files in parallel (CASSANDRA-4663)
>   - Persistent connection (CASSANDRA-4660)
> *Better control*
>   - Cleaner API for error handling
>   - Integrate both IN/OUT streams into one session, so the components(bootstrap, move,
bulkload, repair...) that use streaming can manage them easily.
> *Better reporting*
>   - Better logging/tracing
>   - More metrics
>   - Progress reporting API for external client

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message