drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sorabh Hamirwasia (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5508) Flow control in Drill RPC layer
Date Sat, 13 May 2017 00:19:04 GMT
Sorabh Hamirwasia created DRILL-5508:
----------------------------------------

             Summary: Flow control in Drill RPC layer
                 Key: DRILL-5508
                 URL: https://issues.apache.org/jira/browse/DRILL-5508
             Project: Apache Drill
          Issue Type: Improvement
          Components: Execution - RPC
            Reporter: Sorabh Hamirwasia


Drill uses Netty to implement it's RPC layer. Netty internally has _ChannelOutboudBuffer_
where it stores all the data sent by application when TCP send buffer is full. Netty also
has a concept of _WRITE_BUFFER_HIGH_WATER_MARK_ and _LOW_BUFFER_HIGH_WATER_MARK_ which are
configurable and help to know when the send buffer is full or when it can accept more data.
The channel writability is turned on/off based on these parameters which application can use
to make smart decision. More information can be found [here|https://netty.io/4.1/api/io/netty/channel/WriteBufferWaterMark.html].
All these together can help to implement flow control in Drill. Today in Drill the only flow
control we have is based on number of batches sent (which is 3) without ack. But that doesn't
consider how much data is transferred as part of those batches. Without using the proper flow
control based on water marks Drill is just overwhelming the pipeline. 

With Drill 1.11 support for SASL encryption, there is a new SaslEncryptionHandler inserted
in Drill channel pipeline.This handler takes the Drill ByteBuf and encrypt it and stores the
encrypted buffer (>= original buffer) in another ByteBuf. Now in this way the memory consumption
is doubled until next handler in pipeline is called when original buffer will be released.
There is a risk where if multiple connections (say N) happen to do encryption on larger Data
buffers (say of size D) at same time then each will end up doubling the memory consumption
at that instance. The total memory consumption will be Mc = N*2D. This can happen even without
encryption when the connection count is doubled (i.e. 2N) which are transferring (D size of
data). In constrained memory environment this can be an issue if Mc is too large.

To resolve issues in both the scenarios it is required to have flow control in place for Drill
RPC layer. Basically we can configure High/Low Watermarks (based on % of ChannelOutboundbuffer)
and ChannelOutboundbuffer (multiple of Chunk size) for Drill channel's. Then the application
thread which just write entire message in one go, need to chunk the message in some smaller
sizes (possibly configurable). Based on the channel write state, one or more chunk should
be written to socket. If the channel Writable state is false then application thread will
block until it get's notified of the state change in which case it can again send more chunk
downstream. In this way we are achieving below:
1) In case when encryption is disabled Netty's ChannelOutboundbuffer will not be overwhelmed.
It will always have streamline flow of data to send over network.
2) In case when encryption is enabled then we will always send smaller chunks to the pipeline
to encrypt rather than entire Data buffer. This will double the memory in smaller units causing
less memory pressure.

Note: This is just high level description of the problem and what can be a potential solution.
It needs more research/prototyping to come up with a proper solution.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message