zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: zookeeper / solr cloud problems
Date Fri, 13 Dec 2019 20:36:52 GMT
On 12/13/2019 11:01 AM, Kojo wrote:
> We had already changed SO configuration before the last crash, so I think
> that the problem is not there.
> 
> ulimit -a
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 257683
> max locked memory       (kbytes, -l) 64
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 65535
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 8192
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 65535
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited

Are you running this ulimit command as the same user that is running 
your Solr process?  It must be the same user to learn anything useful. 
This output indicates that the user that's running the ulimit command is 
allowed to start 64K processes, which I would think should be enough.

Best guess here is that the actual user that's running Solr does *NOT* 
have its limits increased.  It may be a different user than you're using 
to run the ulimit command.

> When Solr tries to delete a znode? I´am sorry, because I understand nothing
> about this process, and it is the only point that seems suspicios for me.
> Do you think that it can cause inconsistency leading to the OOM problem?

OOME isn't caused by inconsistencies at the application level.  It's a 
low-level problem, an indication that Java tried to do something 
required to run the program that it couldn't do.

I assume that it's Solr trying to delete the znode, because the node 
path has solr in it.  It will be the ZK client running inside Solr 
that's actually trying to do the work, but Solr code probably initiated it.

> Just after this INFO message above, ZK log starts to log thousands of this
> block of lines below. Where it seems that ZK creates and closes thousands
> of sessions.

I responded to this thread because I have some knowledge about Solr.  I 
really have no idea what these additional ZK server logs might mean. 
The one that you quoted before was pretty straightforward, so I was able 
to understand it.

Anything that gets logged after an OOME is suspect and may be useless. 
The execution of a Java program after OOME is unpredictable, because 
whatever was being run when the OOME was thrown did NOT successfully 
execute.

Thanks,
Shawn

Mime
View raw message