kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Koshy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-4050) Allow configuration of the PRNG used for SSL
Date Wed, 17 Aug 2016 04:00:24 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423839#comment-15423839

Joel Koshy commented on KAFKA-4050:

A stack trace should help further clarify. (This is from a thread dump that Todd shared with
us offline). Thanks [~toddpalino] and [~mgharat] for finding this.

"kafka-network-thread-1393-SSL-30" #114 prio=5 os_prio=0 tid=0x00007f2ec8c30800 nid=0x5c1e
waiting for monitor entry [0x00007f213b8f9000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:481)
	- waiting to lock <0x0000000641508bf8> (a java.lang.Object)
	at sun.security.provider.NativePRNG$RandomIO.access$400(NativePRNG.java:329)
	at sun.security.provider.NativePRNG.engineNextBytes(NativePRNG.java:218)
	at java.security.SecureRandom.nextBytes(SecureRandom.java:468)
	- locked <0x000000066aad9880> (a java.security.SecureRandom)
	at sun.security.ssl.CipherBox.createExplicitNonce(CipherBox.java:1015)
	at sun.security.ssl.EngineOutputRecord.write(EngineOutputRecord.java:287)
	at sun.security.ssl.EngineOutputRecord.write(EngineOutputRecord.java:225)
	at sun.security.ssl.EngineWriter.writeRecord(EngineWriter.java:186)
	- locked <0x0000000671c5c978> (a sun.security.ssl.EngineWriter)
	at sun.security.ssl.SSLEngineImpl.writeRecord(SSLEngineImpl.java:1300)
	at sun.security.ssl.SSLEngineImpl.writeAppRecord(SSLEngineImpl.java:1271)
	- locked <0x0000000671ce7170> (a java.lang.Object)
	at sun.security.ssl.SSLEngineImpl.wrap(SSLEngineImpl.java:1186)
	- locked <0x0000000671ce7150> (a java.lang.Object)
	at javax.net.ssl.SSLEngine.wrap(SSLEngine.java:469)
	at org.apache.kafka.common.network.SslTransportLayer.write(p.java:557)
	at kafka.api.TopicDataSend.writeTo(FetchResponse.scala:146)
	at org.apache.kafka.common.network.MultiSend.writeTo(MultiSend.java:81)
	at kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:292)
	at org.apache.kafka.common.network.KafkaChannel.send(KafkaChannel.java:158)
	at org.apache.kafka.common.network.KafkaChannel.write(KafkaChannel.java:146)
	at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:329)
	at org.apache.kafka.common.network.Selector.poll(Selector.java:283)
	at kafka.network.Processor.poll(SocketServer.scala:472)
	at kafka.network.Processor.run(SocketServer.scala:412)
	at java.lang.Thread.run(Thread.java:745)

Of note is that all of the network threads are waiting on the same NativePRNG lock (0x0000000641508bf8)

> Allow configuration of the PRNG used for SSL
> --------------------------------------------
>                 Key: KAFKA-4050
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4050
>             Project: Kafka
>          Issue Type: Improvement
>          Components: security
>    Affects Versions:
>            Reporter: Todd Palino
>            Assignee: Todd Palino
>              Labels: security, ssl
> This change will make the pseudo-random number generator (PRNG) implementation used by
the SSLContext configurable. The configuration is not required, and the default is to use
whatever the default PRNG for the JDK/JRE is. Providing a string, such as "SHA1PRNG", will
cause that specific SecureRandom implementation to get passed to the SSLContext.
> When enabling inter-broker SSL in our certification cluster, we observed severe performance
issues. For reference, this cluster can take up to 600 MB/sec of inbound produce traffic over
SSL, with RF=2, before it gets close to saturation, and the mirror maker normally produces
about 400 MB/sec (unless it is lagging). When we enabled inter-broker SSL, we saw persistent
replication problems in the cluster at any inbound rate of more than about 6 or 7 MB/sec per-broker.
This was narrowed down to all the network threads blocking on a single lock in the SecureRandom
> It turns out that the default PRNG implementation on Linux is NativePRNG. This uses randomness
from /dev/urandom (which, by itself, is a non-blocking read) and mixes it with randomness
from SHA1. The problem is that the entire application shares a single SecureRandom instance,
and NativePRNG has a global lock within the implNextBytes method. Switching to another implementation
(SHA1PRNG, which has better performance characteristics and is still considered secure) completely
eliminated the bottleneck and allowed the cluster to work properly at saturation.
> The SSLContext initialization has an optional argument to provide a SecureRandom instance,
which the code currently sets to null. This change creates a new config to specify an implementation,
and instantiates that and passes it to SSLContext if provided. This will also let someone
select a stronger source of randomness (obviously at a performance cost) if desired.

This message was sent by Atlassian JIRA

View raw message