zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andor Molnar <an...@apache.org>
Subject Re: zookeeper / solr cloud problems
Date Mon, 06 Jan 2020 13:44:32 GMT
Hi Koji,

I reckon the best would be to raise this issue on Solr user list. 
I’m not sure if you could get any more help about it here.


> On 2019. Dec 14., at 1:09, Kojo <rbsnkjmr@gmail.com> wrote:
> Shawn,
> unfortunately, this ulimit values are for the solr user. I already checked
> for the zk user, we set the same values.
> No constrain for process creation.
> This box is 128Gb, and Solr starts with 32Gb heap memory.  Only one small
> collection ~400k documents.
> I see no resources constrain.
> I see no application level (Python), doing anything wrong.
> I am looking for any clue to solve this problem.
> Is it usefull if I start Solr and set memory dump, in case of crash?
>   -
>   /opt/solr-6.6.2/bin/solr -m 32g -e cloud -z localhost:2181 -a
>   "-XX:+HeapDumpOnOutOfMemoryError" -a
>   "-XX:HeapDumpPath=/opt/solr-6.6.2/example/cloud/node1/logs/archived"
> Thank you,
> Koji
> Em sex., 13 de dez. de 2019 às 18:37, Shawn Heisey <apache@elyograg.org>
> escreveu:
>> On 12/13/2019 11:01 AM, Kojo wrote:
>>> We had already changed SO configuration before the last crash, so I think
>>> that the problem is not there.
>>> ulimit -a
>>> core file size          (blocks, -c) 0
>>> data seg size           (kbytes, -d) unlimited
>>> scheduling priority             (-e) 0
>>> file size               (blocks, -f) unlimited
>>> pending signals                 (-i) 257683
>>> max locked memory       (kbytes, -l) 64
>>> max memory size         (kbytes, -m) unlimited
>>> open files                      (-n) 65535
>>> pipe size            (512 bytes, -p) 8
>>> POSIX message queues     (bytes, -q) 819200
>>> real-time priority              (-r) 0
>>> stack size              (kbytes, -s) 8192
>>> cpu time               (seconds, -t) unlimited
>>> max user processes              (-u) 65535
>>> virtual memory          (kbytes, -v) unlimited
>>> file locks                      (-x) unlimited
>> Are you running this ulimit command as the same user that is running
>> your Solr process?  It must be the same user to learn anything useful.
>> This output indicates that the user that's running the ulimit command is
>> allowed to start 64K processes, which I would think should be enough.
>> Best guess here is that the actual user that's running Solr does *NOT*
>> have its limits increased.  It may be a different user than you're using
>> to run the ulimit command.
>>> When Solr tries to delete a znode? I´am sorry, because I understand
>> nothing
>>> about this process, and it is the only point that seems suspicios for me.
>>> Do you think that it can cause inconsistency leading to the OOM problem?
>> OOME isn't caused by inconsistencies at the application level.  It's a
>> low-level problem, an indication that Java tried to do something
>> required to run the program that it couldn't do.
>> I assume that it's Solr trying to delete the znode, because the node
>> path has solr in it.  It will be the ZK client running inside Solr
>> that's actually trying to do the work, but Solr code probably initiated it.
>>> Just after this INFO message above, ZK log starts to log thousands of
>> this
>>> block of lines below. Where it seems that ZK creates and closes thousands
>>> of sessions.
>> I responded to this thread because I have some knowledge about Solr.  I
>> really have no idea what these additional ZK server logs might mean.
>> The one that you quoted before was pretty straightforward, so I was able
>> to understand it.
>> Anything that gets logged after an OOME is suspect and may be useless.
>> The execution of a Java program after OOME is unpredictable, because
>> whatever was being run when the OOME was thrown did NOT successfully
>> execute.
>> Thanks,
>> Shawn

View raw message