ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ilya Kasnacheev <ilya.kasnach...@gmail.com>
Subject Re: Cluster freeze with SSL enabled and JDK 11
Date Wed, 13 Feb 2019 13:58:41 GMT
Hello!

For TLSv1.2 on Windows the fix is ready and tests are running for it. Hope
that it will be integrated soon.

Regards,
-- 
Ilya Kasnacheev


вт, 12 февр. 2019 г. в 20:46, Loredana Radulescu Ivanoff <lradules@tibco.com
>:

> Thank you very much for the info, it was very helpful.
>
> I assume it worked on Linux because I specifically set TLS v1.2 as a JVM
> argument, by specifying -Djdk.tls.server.protocols="TLSv1.2"
> -Djdk.tls.client.protocols="TLSv1.2"
>
> Would you be able to provide a (very) loose estimate for the fix? Is it
> likely to go into 2.8?
>
> Thank you again!
>
> On Tue, Feb 12, 2019 at 7:10 AM Ilya Kasnacheev <ilya.kasnacheev@gmail.com>
> wrote:
>
>> Hello!
>>
>> It seems that you have problems due not just one but two issues:
>>
>> 1) Java 11 has TLSv1.3 by default and Ignite does not support that -
>> https://issues.apache.org/jira/browse/IGNITE-11298
>> why it worked for you on CentOS is a mystery. For some reason by Ubuntu
>> has Java 10 in openjdk-11-jdk package and it worked. When I manually
>> installed proper Java 11 it would not work on Linux just the same as on
>> Windows. Falling back to TLSv1.2 could help, but,
>>
>> 2) on Windows SSL fails to work on Java 11 due to mistake in Ignite's NIO
>> code. I also has created the ticket and currently devising a patch:
>> https://issues.apache.org/jira/browse/IGNITE-11299
>> More details in JIRA.
>>
>> I'm afraid your options are limited on Windows - use older Java or move
>> to Linux.
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> пт, 8 февр. 2019 г. в 02:31, Loredana Radulescu Ivanoff <
>> lradules@tibco.com>:
>>
>>> Hello,
>>>
>>> I would like to restart this topic because I can get a repro on Windows
>>> 10 with Java 11 and SSL enabled by starting two nodes using just the 2.7
>>> Ignite distribution. I'm starting the Ignite nodes via ignite.bat, and I've
>>> only added a few extra JVM options to allow Ignite to start with Java 11,
>>> as follows:
>>>
>>> --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED
>>> --add-exports=java.base/sun.nio.ch=ALL-UNNAMED
>>> -Djdk.tls.server.protocols="TLSv1.2" -Djdk.tls.client.protocols="TLSv1.2"
>>> -Djdk.tls.acknowledgeCloseNotify=true -DIGNITE_QUIET=false
>>> -DIGNITE_SYSTEM_WORKER_BLOCKED_TIMEOUT=60000
>>>
>>> I'm attaching the logs from work/log and the configuration I've used.
>>> Could you please take a look and let me know if you see something wrong in
>>> the configuration, or a possible explanation?
>>>
>>> What is also interesting is that I used the same setup on two CentOS
>>> machines, and the same type of configuration, and the nodes do connect
>>> (with SSL and Java 11), without any errors. Could there be a platform issue
>>> here?
>>>
>>> Additionally, I confirmed that the nodes are able to connect as expected
>>> on both Windows and CentOS when SSL is disabled (I used the same
>>> configuration, but with the sslContextFactory bean commented out.
>>>
>>> Any help on the issue would be greatly appreciated. Thank you!
>>>
>>>
>>>
>>> On Thu, Oct 18, 2018 at 2:56 PM Loredana Radulescu Ivanoff <
>>> lradules@tibco.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> I can consistently reproduce this issue with Ignite 2.6.0, JDK 11 and
>>>> SSL enabled:
>>>>
>>>>
>>>>    - the second node that I bring up joins, and then shortly after
>>>>    freezes and prints this message every minute:
>>>>
>>>> "WARN ...[*Initialization*]
>>>> processors.cache.GridCachePartitionExchangeManager: Still waiting for
>>>> initial partition map exchange"
>>>>
>>>>
>>>>    - once the second node joins, the first node starts experiencing
>>>>    very frequent 100% CPU spikes; these are the messages I see:
>>>>
>>>> WARN 2018-10-18T13:50:52,728-0700 []
>>>> communication.tcp.TcpCommunicationSpi: Communication SPI session write
>>>> timed out (consider increasing 'socketWriteTimeout' configuration property)
>>>> [remoteAddr=/10.100.36.82:51620, writeTimeout=15000]
>>>> WARN 2018-10-18T13:50:52,737-0700 []
>>>> communication.tcp.TcpCommunicationSpi: Failed to shutdown SSL session
>>>> gracefully (will force close) [ex=javax.net.ssl.SSLException: Incorrect SSL
>>>> engine status after closeOutbound call [status=OK,
>>>> handshakeStatus=NEED_WRAP,
>>>> WARN 2018-10-18T13:51:01,441-0700 []
>>>> dht.preloader.GridDhtPartitionsExchangeFuture: Unable to await partitions
>>>> release latch within timeout: ServerLatch [permits=1,
>>>> pendingAcks=[aeba8bb7-c9b8-4d46-be8a-df361eaa8fc5], super=CompletableLatch
>>>> [id=exchange, topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0]]]
>>>>
>>>> Other observations:
>>>>
>>>> I can reproduce this every time I start the nodes, and it doesn't
>>>> matter which node comes up first.
>>>>
>>>>
>>>> The issue goes away if I disable SSL.
>>>>
>>>>
>>>> Increasing the socketWriteTimeout, networkTimeout or the
>>>> failureDetectionTimeout does not help.
>>>>
>>>> It seems to be happening only with JDK 11, and not with JDK 8.
>>>>
>>>>
>>>> Do you have any suggestions/known issues about this?
>>>>
>>>> Thank you,
>>>>
>>>> Loredana
>>>>
>>>>
>>>>
>>>>
>>>>

Mime
View raw message