cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-8457) nio MessagingService
Date Wed, 15 Feb 2017 16:54:41 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868091#comment-15868091
] 

Ariel Weisberg edited comment on CASSANDRA-8457 at 2/15/17 4:53 PM:
--------------------------------------------------------------------

{quote}
Thus, I propose to bring back the SwappingByteBufDataOutputStreamPlus that I had in an earlier
commit. To recap, the basic idea is provide a DataOutputPlus that has a backing ByteBuffer
that is written to, and when it is filled, it is written to the netty context and flushed,
then allocate a new buffer for more writes - kinda similar to a BufferedOutputStream, but
replacing the backing buffer when full. Bringing this idea back is also what underpins one
of the major performance things I wanted to address: buffering up smaller messages into one
buffer to avoid going back to the netty allocator for every tiny buffer we might need - think
Mutation acks.
{quote}

What thread is going to be writing to the output stream to serialize the messages? If it's
a Netty thread you can't block inside a serialization method waiting for the bytes to drain
to the socket that is not keeping up. You also can't wind out of the serialization method
and continue it later.

If it's an application thread then it's no longer asynchronous and a slow connection can block
the application and prevent it from doing say a quorum write to just the fast nodes. You would
also need to lock during serialization or queue concurrently sent messages behind the one
currently being written.

With large messages we aren't really fully eliminating the issue only making it a factor better.
At the other end you still need to materialize a buffer containing the message + the object
graph you are going to materialize. This is different from how things worked previously where
we had a dedicated thread that would read fixed size buffers and then materialize just the
object graph from that. 

To really solve this we need to be able to avoid buffering the entire message at both sending
and receiving side. The buffering is worse because we are allocating contiguous memory and
not just doubling the space impact. We could make it incrementally better by using chains
of fixed size buffers so there is less external fragmentation and allocator overhead. That's
still committing additional memory compared to pre-8457, but at least it's being committed
in a more reasonable way.

I think the most elegant solution is to use a lightweight thread implementation. What we will
probably be boxed into doing is making the serialization of result data and other large message
portions able to yield. This will bound the memory committed to large messages to the largest
atomic portion we have to serialize (Cell?).

Something like an output stream being able to say "shouldYield". If you continue to write
it will continue to buffer and not fail, but use memory. Then serializers can implement a
return value for serialize which indicates whether there is more to serialize. You would check
shouldYield after each Cell or some unit of work when serializing. Most of these large things
being serialized are iterators which could be stashed away. The trick will be that most serialization
is stateless, and objects are serialized concurrently so you can't stored the serialization
state in the object being serialized safely.

We also need to solve incremental non-blocking deserialization on the receive side and that
I don't know. That's even trickier because you don't control how the message is fragmented
so you can't insert the yield points trivially.


was (Author: aweisberg):
{quote}
Thus, I propose to bring back the SwappingByteBufDataOutputStreamPlus that I had in an earlier
commit. To recap, the basic idea is provide a DataOutputPlus that has a backing ByteBuffer
that is written to, and when it is filled, it is written to the netty context and flushed,
then allocate a new buffer for more writes - kinda similar to a BufferedOutputStream, but
replacing the backing buffer when full. Bringing this idea back is also what underpins one
of the major performance things I wanted to address: buffering up smaller messages into one
buffer to avoid going back to the netty allocator for every tiny buffer we might need - think
Mutation acks.
{quote}

What thread is going to be writing to the output stream to serialize the messages? If it's
a Netty thread you can't block inside a serialization method waiting for the bytes to drain
to the socket that is not keeping up. You also can't wind out of the serialization method
and continue it later.

If it's an application thread then it's no longer asynchronous and a slow connection can block
the application and prevent it from doing say a quorum write to just the fast nodes. You would
also need to lock during serialization or queue concurrently sent messages behind the one
currently being written.

With large messages we aren't really fully eliminating the issue only making it a factor better.
At the other end you still need to materialize a buffer containing the message + the object
graph you are going to materialize. This is different from how things worked previously where
we had a dedicated thread that would read fixed size buffers and then materialize just the
object graph from that. 

To really solve this we need to be able to avoid buffering the entire message at both sending
and receiving side. The buffering is worse because we are allocating contiguous memory and
not just doubling the space impact. We could make it incrementally better by using chains
of fixed size buffers so there is less external fragmentation and allocator overhead. That's
still committing additional memory compared to pre-8457, but at least it's being committed
in a more reasonable way.

I think the most elegant solution is to use a lightweight thread implementation. What we will
probably be boxed into doing is making the serialization of result data and other large message
portions able to yield. This will bound the memory committed to large messages to the largest
atomic portion we have to serialize (Cell?).

Something like an output stream being able to say "shouldYield". If you continue to write
it will continue to buffer and not fail, but use memory. Then serializers can implement a
return value for serialize which indicates whether there is more to serialize. You would check
shouldYield after each Cell or some unit of work when serializing. Most of these large things
being serialized are iterators. The trick will be that most serialization is stateless, and
objects are serialized concurrently so you can't stored the serialization state in object
being serialized safely.

We also need to solve incremental non-blocking deserialization on the receive side and that
I don't know. That's even trickier because you don't control how the message is fragmented
so you can't insert the yield points trivially.

> nio MessagingService
> --------------------
>
>                 Key: CASSANDRA-8457
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Jason Brown
>            Priority: Minor
>              Labels: netty, performance
>             Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big contributor to context
switching, especially for larger clusters.  Let's look at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message