From user-return-64547-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Mon Oct 14 21:34:55 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id CDEC2180648 for ; Mon, 14 Oct 2019 23:34:54 +0200 (CEST) Received: (qmail 68520 invoked by uid 500); 14 Oct 2019 21:34:50 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 68510 invoked by uid 99); 14 Oct 2019 21:34:50 -0000 Received: from ui-eu-02.ponee.io (HELO localhost) (116.202.110.96) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Oct 2019 21:34:50 +0000 To: From: Sergio Bilello MIME-Version: 1.0 Date: Mon, 14 Oct 2019 21:34:49 -0000 Subject: Cassadra node join problem In-Reply-To: Message-ID: x-ponymail-agent: PonyMail Composer/0.2 References: Content-Type: text/plain; charset=utf-8 X-Mailer: LuaSocket 3.0-rc1 x-ponymail-sender: fbd14426bd4c58e5302921357e6bd08419fa77b7 Problem: The cassandra node does not work even after restart throwing this exception: WARN [Thread-83069] 2019-10-11 16:13:23,713 CustomTThreadPoolServer.java:125 - Transport error occurred during acceptance of message. org.apache.thrift.transport.TTransportException: java.net.SocketException: Socket closed at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:109) ~[apache-cassandra-3.11.4.jar:3.11.4] at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:36) ~[apache-cassandra-3.11.4.jar:3.11.4] at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:60) ~[libthrift-0.9.2.jar:0.9.2] at org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:113) ~[apache-cassandra-3.11.4.jar:3.11.4] at org.apache.cassandra.thrift.ThriftServer$ThriftServerThread.run(ThriftServer.java:134) [apache-cassandra-3.11.4.jar:3.11.4] The CPU Load goes to 50 and it becomes unresponsive. Node configuration: OS: Linux 4.16.13-1.el7.elrepo.x86_64 #1 SMP Wed May 30 14:31:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux This is a working node that does not have the recommended settings but it is working and it is one of the first node in the cluster cat /proc/23935/limits Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 122422 122422 processes Max open files 65536 65536 files Max locked memory 65536 65536 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 122422 122422 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us I tried to bootstrap a new node that joins the existing cluster. The disk space used is around 400GB SSD over 885GB available At my first attempt, the node failed and got restarted over and over by systemctl that does not honor the limits configuration specified and thrown Caused by: java.nio.file.FileSystemException: /mnt/cassandra/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/md-52-big-Index.db: Too many open files at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) ~[na:1.8.0_161] at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[na:1.8.0_161] at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[na:1.8.0_161] at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) ~[na:1.8.0_161] at java.nio.channels.FileChannel.open(FileChannel.java:287) ~[na:1.8.0_161] at java.nio.channels.FileChannel.open(FileChannel.java:335) ~[na:1.8.0_161] at org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:104) ~[apache-cassandra-3.11.4.jar:3.11.4] .. 20 common frames omitted ^C I fixed the above by stopping cassandra, cleaning commitlog, saved_caches, hints and data directory and restarting it and getting the PID and run the 2 commands below sudo prlimit -n1048576 -p sudo prlimit -u32768 -p because at the beginning the node didn't even joint the cluster. it was reported by UJ. After fixing the max open file problem, The node from UpJoining passed to the status UpNormal The node joined the cluster but after a while, it started to throw WARN [Thread-83069] 2019-10-11 16:13:23,713 CustomTThreadPoolServer.java:125 - Transport error occurred during acceptance of message. org.apache.thrift.transport.TTransportException: java.net.SocketException: Socket closed at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:109) ~[apache-cassandra-3.11.4.jar:3.11.4] at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:36) ~[apache-cassandra-3.11.4.jar:3.11.4] at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:60) ~[libthrift-0.9.2.jar:0.9.2] at org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:113) ~[apache-cassandra-3.11.4.jar:3.11.4] at org.apache.cassandra.thrift.ThriftServer$ThriftServerThread.run(ThriftServer.java:134) [apache-cassandra-3.11.4.jar:3.11.4] I compared cassandra.yaml, limits.conf but it looks like that it does not help. I don't know how the current nodes are working since they don't have the recommended cassandra limits. Any suggestions on the possible culprit? Please let me know Thanks --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org For additional commands, e-mail: user-help@cassandra.apache.org