cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Brown (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-13630) support large internode messages with netty
Date Wed, 23 Aug 2017 12:57:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138305#comment-16138305
] 

Jason Brown commented on CASSANDRA-13630:
-----------------------------------------

bq. I thought worst case memory amplification from this NIO approach was 2x message size which
is worse than our current 1x message size, but it's not, it's cluster size * message size
if a message is fanned out to all nodes in the cluster. 

We do not have 1x amplification in pre-4.0 code; it's always been messageSize times the number
of target peers. In `OutboundTcpConnector` we wrote into a [backing buffer of 64k|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L457]
for each outbound peer and flushed when the buffer filled up (see `BufferedDataOutputStreamPlus`).
The cost of the amplification is hidden by that reusable backing buffer, but it's still there.

With CASSANDRA-8457, everything gets it's own distinct buffer, allocated once per-message,
which is serialized to and then flushed. With this ticket we'll move back to the previous
model where there's a backing buffer that's used for aggregating small messages or chunks
of larger messages. That buffer, of course, is not reused, but that's because of the asynchronous
nature of NIO vs blocking IO. 

(FTR, I have thought about moving serialization outside of the "outbound connections" (either
`OutboundTcpConnection` or netty handlers) - where we serialize before sending to the outbound
channels and send a slice of a buffer to those channels. That way you only serialize once
(less repetitive CPU work), as well as potentially consume less memory. But I think that's
a different ticket.)

bq. I really wonder if that be a shared pool of threads and we size it generously

yeah, i thought about this. The problem is that because the deserialization is blocking, you
basically need one thread in the pool for each "blocker"; else you starve some deserialization
activities. Hence, i just used a background thread. Not my favorite choice, but I'm not sure
a "well-sized" pool will be sufficient. 

Reading over your comments on the code itself this morning.


> support large internode messages with netty
> -------------------------------------------
>
>                 Key: CASSANDRA-13630
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13630
>             Project: Cassandra
>          Issue Type: Task
>          Components: Streaming and Messaging
>            Reporter: Jason Brown
>            Assignee: Jason Brown
>             Fix For: 4.0
>
>
> As part of CASSANDRA-8457, we decided to punt on large mesages to reduce the scope of
that ticket. However, we still need that functionality to ship a correctly operating internode
messaging subsystem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message