tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rainer Jung <rainer.j...@kippdata.de>
Subject Current problems with TLS 1.0 and NIO(2)+native+openssl 1.1.1
Date Sun, 25 Nov 2018 09:42:19 GMT
I observed that when building tcnative against OpenSSL 1.1.1 I ran into 
hangs when talking TLS 1.0 with Tomcat trunk using that tcnative plus 
Nio(2).

A simple "GET /" request eg. send with curl, hangs for 60 seconds after 
a successful TLS handshake, then the client ends with an "empty reply 
from server".

You can also reproduce with openssl s_client. The request will hang 
until you send another additional empty line (in addition to the usual 
empty line ending the request). The additional one will then trigger 
another read which will find the old request data and handle it.

The problem does not occur with the APR connector. APR and Nio(2) seem 
to use very different code paths in tcnative for TLS handling 
(sslnetwork.c versus ssl.c).

I have some understanding of the root cause but currently no good idea 
how to fix it. The root cause is incorrect handling of SSL_read when it 
returns "0". The OpenSSL man page has a relevant description at [1]. As 
observed also in mod_ssl (Apache web server), OpenSSL 1.1.1 behaves 
different than older version in that it can return "0", were old 
versions returned "-1". That was always documented as a possibility but 
in reality now really happens. The tcnative code used by APR handles 
this in the native part. The code used by Nio(2) simply returns the 
value it gets from SSL_read() and leaves it to the calling Java to 
handle that. netty, from which we borrowed the ideas for Java plus 
OpenSSL, does include such code in ReferenceCountedOpenSslEngine.java, 
especially the SSL_ERROR_WANT_READ and SSL_ERROR_WANT_WRITE handling.

I could have experimented with their approach, but for some reason there 
seems to be another problem that makes it harder. The relevant call to 
SSL_read() returns "0", but does not return WANT_READ or WANT_WRITE from 
a following SSL_get_error(), but instead "5", which is 
SSL_ERROR_SYSCALL. I do not have a good idea, where this comes from. 
When tracing system calls, it seems it comes from an EAGAIN in a socket 
read, but I am not sure about that.

In our Java code, what happens is a call to unwrap() in OpenSSLEngine. 
This call writes I think 146 bytes, then checks 
pendingReadableBytesInSSL(). That call in turn calls SSL.readFromSSL() 
and gets back "0" (from SSL_read()). Up in unwrap() we then skip the 
while loop and finally return with BUFFER_UNDERFLOW. Then we hang, 
probably because the data was read by OpenSSL and no more socket event 
happens. If I artificially add another call to 
pendingReadableBytesInSSL() which triggers another SSL_read(), the hang 
does not occur.

IMHO TLS 1.0 is not such a big problem, but we should at least document 
it when we do a new release.

I might drill down debugging into the native layer checking errno etc. 
but I am not sure I will find the time.

[1]: https://www.openssl.org/docs/man1.1.1/man3/SSL_read.html

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Mime
View raw message