cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Brown (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-12229) Move streaming to non-blocking IO and netty (streaming 2.1)
Date Fri, 12 May 2017 22:05:04 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008698#comment-16008698
] 

Jason Brown edited comment on CASSANDRA-12229 at 5/12/17 10:04 PM:
-------------------------------------------------------------------

[~aweisberg] created a [PR|https://github.com/jasobrown/cassandra/pull/1/files], and added
a bunch of comments. I took his feedback, and created a new branch and a [new PR|https://github.com/jasobrown/cassandra/pull/2/files]
for comments.

Significant changes in this rev:

- Ariel suggested moving the disk IO off the event loop on the sending side, and keep a blocking
IO behavior for the disk reads. Doing this allowed me to go back and reuse the {{StreamReader}}/{{StreamWriter}}
set of classes. To achieve the disk reads to happen on the event loop required some back flips,
so ditching that code is not a bad thing.
- While I was reverting back to the {{StreamReader}} classes, I could also revert the {{StreamMessage}}
changes.

Reverting back (and lightly modifying) those classes resulted in nearly the same performance
(and there's always more tuning to be done), with ~40% reduction in the patch set from trunk.

A few oddities needs to be cleaned up:
- SwappingByteBufDataOutputStreamPlus - this is an experiment from a experimental branch from
CASSANDRA-8457. The basic idea for this class is sound, but the naming and implementation
might be a bit funky.
- restoring a few unit tests
- I've (temporariliy) removed the checksumming from {{StreamCompressionSerializer}} as it
does incur about a 30% performance penalty on streaming uncompressed sstables. This cost might
be covered over once files can be parallel, but I've pulled it out for now and would like
to have a discussion on it.
- There's some class/object funkiness in the modified {{StreamReader}} classes where they
cast to get a direct reference to the netty channel and what not. That strangeness should
be noted, but not hold up the next round of review.



was (Author: jasobrown):
[~aweisberg] created a [PR|https://github.com/jasobrown/cassandra/pull/1/files], and added
a bunch of comments. I took his feedback, and created a new branch and a [new PR|https://github.com/jasobrown/cassandra/pull/2/files]
for comments.

Significant changes in this rev:

- Ariel suggested moving the disk IO off the event loop on the sending side, and keep a blocking
IO behavior for the disk reads. Doing this allowed me to go back and reuse the {{StreamReader}}/{{StreamWriter}}
set of classes. To achieve the disk reads to happen on the event loop required some back flips,
so ditching that code is not a bad thing.
- While I was reverting back to the {{StreamReader}} classes, I could also revert the {{StreamMessage}}
changes.

Reverting back (and lightly modifying) those classes resulted in nearly the same performance
(and there's always more tuning to be done), with ~40% reduction in the patch set from trunk.

A few oddities needs to be cleaned up:
- SwappingByteBufDataOutputStreamPlus - this is an experiment from a experimental branch from
CASSANDRA-8457. The basic idea for this class is sound, but the naming and implementation
might be a bit funky.
- restoring a few unit tests
- I've (temporariliy) removed the checksumming from {{StreamCompressionSerializer}} as it
does incur about a 30% performance penalty on streaming uncompressed sstables. This cost might
be covered over once files can be parallel, but I've pulled it out for now and would like
to have a discussion on it.


> Move streaming to non-blocking IO and netty (streaming 2.1)
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-12229
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12229
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Streaming and Messaging
>            Reporter: Jason Brown
>            Assignee: Jason Brown
>             Fix For: 4.0
>
>
> As followup work to CASSANDRA-8457, we need to move streaming to use netty.
> Streaming 2.0 (CASSANDRA-5286) brought many good improvements to how files are transferred
between nodes in a cluster. However, the low-level details of the current streaming implementation
does not line up nicely with a non-blocking model, so I think this is a good time to review
some of those details and add in additional goodness. The current implementation assumes a
sequential or "single threaded" approach to the sending of stream messages as well as the
transfer of files. In short, after several iterative prototypes, I propose the following:
> 1) use a single bi-diredtional connection (instead of requiring to two sockets &
two threads)
> 2) send the "non-file" {{StreamMessage}} s (basically anything not {{OutboundFileMessage}})
via the normal internode messaging. This will require a slight bit more management of the
session (the ability to look up a {{StreamSession}} from a static function on {{StreamManager}},
but we have have most of the pieces we need for this already.
> 3) switch to a non-blocking IO model (facilitated via netty)
> 4) Allow files to be streamed in parallel (CASSANDRA-4663) - this should just be a thing
already
> 5) If the entire sstable is to streamed, in addition to the DATA component, transfer
all the components of the sstable (primary index, bloom filter, stats, and so on). This way
we can avoid the CPU and GC pressure from deserializing the stream into objects. File streaming
then amounts to a block-level transfer.
> Note: The progress/results of CASSANDRA-11303 will need to be reflected here, as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message