zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kojo <rbsnk...@gmail.com>
Subject Re: zookeeper / solr cloud problems
Date Sat, 14 Dec 2019 00:09:35 GMT
Shawn,
unfortunately, this ulimit values are for the solr user. I already checked
for the zk user, we set the same values.
No constrain for process creation.

This box is 128Gb, and Solr starts with 32Gb heap memory.  Only one small
collection ~400k documents.

I see no resources constrain.
I see no application level (Python), doing anything wrong.

I am looking for any clue to solve this problem.

Is it usefull if I start Solr and set memory dump, in case of crash?

   -

   /opt/solr-6.6.2/bin/solr -m 32g -e cloud -z localhost:2181 -a
   "-XX:+HeapDumpOnOutOfMemoryError" -a
   "-XX:HeapDumpPath=/opt/solr-6.6.2/example/cloud/node1/logs/archived"


Thank you,
Koji


Em sex., 13 de dez. de 2019 às 18:37, Shawn Heisey <apache@elyograg.org>
escreveu:

> On 12/13/2019 11:01 AM, Kojo wrote:
> > We had already changed SO configuration before the last crash, so I think
> > that the problem is not there.
> >
> > ulimit -a
> > core file size          (blocks, -c) 0
> > data seg size           (kbytes, -d) unlimited
> > scheduling priority             (-e) 0
> > file size               (blocks, -f) unlimited
> > pending signals                 (-i) 257683
> > max locked memory       (kbytes, -l) 64
> > max memory size         (kbytes, -m) unlimited
> > open files                      (-n) 65535
> > pipe size            (512 bytes, -p) 8
> > POSIX message queues     (bytes, -q) 819200
> > real-time priority              (-r) 0
> > stack size              (kbytes, -s) 8192
> > cpu time               (seconds, -t) unlimited
> > max user processes              (-u) 65535
> > virtual memory          (kbytes, -v) unlimited
> > file locks                      (-x) unlimited
>
> Are you running this ulimit command as the same user that is running
> your Solr process?  It must be the same user to learn anything useful.
> This output indicates that the user that's running the ulimit command is
> allowed to start 64K processes, which I would think should be enough.
>
> Best guess here is that the actual user that's running Solr does *NOT*
> have its limits increased.  It may be a different user than you're using
> to run the ulimit command.
>
> > When Solr tries to delete a znode? I´am sorry, because I understand
> nothing
> > about this process, and it is the only point that seems suspicios for me.
> > Do you think that it can cause inconsistency leading to the OOM problem?
>
> OOME isn't caused by inconsistencies at the application level.  It's a
> low-level problem, an indication that Java tried to do something
> required to run the program that it couldn't do.
>
> I assume that it's Solr trying to delete the znode, because the node
> path has solr in it.  It will be the ZK client running inside Solr
> that's actually trying to do the work, but Solr code probably initiated it.
>
> > Just after this INFO message above, ZK log starts to log thousands of
> this
> > block of lines below. Where it seems that ZK creates and closes thousands
> > of sessions.
>
> I responded to this thread because I have some knowledge about Solr.  I
> really have no idea what these additional ZK server logs might mean.
> The one that you quoted before was pretty straightforward, so I was able
> to understand it.
>
> Anything that gets logged after an OOME is suspect and may be useless.
> The execution of a Java program after OOME is unpredictable, because
> whatever was being run when the OOME was thrown did NOT successfully
> execute.
>
> Thanks,
> Shawn
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message