hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "kartheek muthyala (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13836) Securing Hadoop RPC using SSL
Date Tue, 07 Feb 2017 06:20:41 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855409#comment-15855409
] 

kartheek muthyala commented on HADOOP-13836:
--------------------------------------------

[~daryn], Thank you for the insightful feedback. :)

When SSL encrypts the databuffers, the length of the data packets differ from the actual data
sent. For example, if we have a 10 byte data packet, after encryption - the data packet can
grow up to 16 byte depending on the algorithm used for encryption. So, when a hadoop RPC is
sent on a channel, we read the data length to get to know the data to be read in advance.
So, in the current readAndProcess, when we replace the socket channel with SSLServerSocketChannel,
the channelRead might read partial data, which might not be able to sense the data length
or data. For example, when we call SSLSocketChannel.read() might yield only 3 bytes, even
though it has read 8 bytes on the channel. These 3 bytes won't be able to decode the data
length, because today we use 4 bytes to understand the data length. So this nature of varying
datalength on the channel, made me to modify the readAndProcess to continuously loop until
we have enough data. This can probably be simplified by having another class which extends
SSLServerSocketChannel and buffers at a layer under readAndProcess. That might avoid the extra
readAndProcess. I will create an improvement on top of this jira to verify if that abstraction
is possible. But even with this extra interface, we still have to loop for the data because
of the same data length issues.


Multi-threaded clients generating requests faster than read will indefinitely tie up a reader
- I am not sure if it gets indefinitely tied up, but they will get processed eventually.
Clients sending a slow trickle of bytes will tie up a reader until a request is fully read.
- This is a problem that exists still today, when large data packets are sent and we use ChannelIO
on the server to process this. 
Clients stalled mid-request will cause the reader to go into a spin loop.
- The connection timeout on the stalled clients, would lead to closure of channel and the
spin loop breaks.


[~wheat9], The performance study quoted in the link occurs on a setup where clients are interfacing
with frontend machines which support HTTPS. They pointed out that "On our production frontend
machines, SSL/TLS accounts for less than 1% of the CPU load, less than 10KB of memory per
connection and less than 2% of network overhead.", so it is an overall 3% overall for them
too including network overhead due to handshaking. I am not sure if this is an Apple to Apple
comparison with the setup on which I have taken performance numbers. The CPU processing speed
in decoding and encoding, SSL protocol used, network bandwidth between the machines and workload
characteristics etc.. might have varied in both the setups. 

> Securing Hadoop RPC using SSL
> -----------------------------
>
>                 Key: HADOOP-13836
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13836
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>            Reporter: kartheek muthyala
>            Assignee: kartheek muthyala
>         Attachments: HADOOP-13836.patch, HADOOP-13836-v2.patch, HADOOP-13836-v3.patch,
HADOOP-13836-v4.patch, Secure IPC OSS Proposal-1.pdf, SecureIPC Performance Analysis-OSS.pdf
>
>
> Today, RPC connections in Hadoop are encrypted using Simple Authentication & Security
Layer (SASL), with the Kerberos ticket based authentication or Digest-md5 checksum based authentication
protocols. This proposal is about enhancing this cipher suite with SSL/TLS based encryption
and authentication. SSL/TLS is a proposed Internet Engineering Task Force (IETF) standard,
that provides data security and integrity across two different end points in a network. This
protocol has made its way to a number of applications such as web browsing, email, internet
faxing, messaging, VOIP etc. And supporting this cipher suite at the core of Hadoop would
give a good synergy with the applications on top and also bolster industry adoption of Hadoop.
> The Server and Client code in Hadoop IPC should support the following modes of communication
> 1.	Plain 
> 2.     SASL encryption with an underlying authentication
> 3.     SSL based encryption and authentication (x509 certificate)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message