lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: High CPU usage with Solr 7.7.0
Date Fri, 29 Mar 2019 14:21:51 GMT
Thanks all. I pushed changes last night, this should be fixed in 7.7.2, 8.1 and master.

Meanwhile, this is a trivial change to one line, so two ways to get by would be

1> just make the change yourself locally. Building Solr from scratch is actually not hard.
The “ant package” target will get you the same thing you’d get from downloading the
distribution.

2> use Java 9 or greater.

Best,
Erick

> On Mar 25, 2019, at 1:58 AM, Lukas Weiss <Lukas.Weiss@raiffeisen.it> wrote:
> 
> I forward this message. Thanks Adam.
> 
> Hi,
> Apologies, I can’t figure out how to reply to the Solr mailing list.
> I just ran across the same high CPU usage issue. I believe it’’s caused by 
> this commit which was introduced in Solr 7.7.0 
> https://github.com/apache/lucene-solr/commit/eb652b84edf441d8369f5188cdd5e3ae2b151434#diff-e54b251d166135a1afb7938cfe152bb5
> There is a bug in JDK versions <=8 where using 0 threads in the 
> ScheduledThreadPool causes high CPU usage: 
> https://bugs.openjdk.java.net/browse/JDK-8129861
> Oddly, the latest version 
> of solr/core/src/java/org/apache/solr/update/CommitTracker.java on 
> master still uses 0 executors as the default. Presumably most everyone is 
> using JDK 9 or greater which has the bug fixed, so they don’t experience 
> the bug.
> Feel free to relay this back to the mailing list.
> Thanks,
> Adam Guthrie
> 
> 
> 
> 
> 
> Von:    "Lukas Weiss" <Lukas.Weiss@raiffeisen.it>
> An:     solr-user@lucene.apache.org, 
> Datum:  27.02.2019 11:13
> Betreff:        High CPU usage with Solr 7.7.0
> 
> 
> 
> Hello,
> 
> we recently updated our Solr server from 6.6.5 to 7.7.0. Since then, we 
> have problems with the server's CPU usage.
> We have two Solr cores configured, but even if we clear all indexes and do 
> 
> not start the index process, we see 100 CPU usage for both cores.
> 
> Here's what our top says:
> 
> root@solr:~ # top
> top - 09:25:24 up 17:40,  1 user,  load average: 2,28, 2,56, 2,68
> Threads:  74 total,   3 running,  71 sleeping,   0 stopped,   0 zombie
> %Cpu0  :100,0 us,  0,0 sy,  0,0 ni,  0,0 id,  0,0 wa,  0,0 hi,  0,0 si, 
> 0,0 st
> %Cpu1  :100,0 us,  0,0 sy,  0,0 ni,  0,0 id,  0,0 wa,  0,0 hi,  0,0 si, 
> 0,0 st
> %Cpu2  : 11,3 us,  1,0 sy,  0,0 ni, 86,7 id,  0,7 wa,  0,0 hi,  0,3 si, 
> 0,0 st
> %Cpu3  :  3,0 us,  3,0 sy,  0,0 ni, 93,7 id,  0,3 wa,  0,0 hi,  0,0 si, 
> 0,0 st
> KiB Mem :  8388608 total,  7859168 free,   496744 used,    32696 
> buff/cache
> KiB Swap:  2097152 total,  2097152 free,        0 used.  7859168 avail Mem 
> 
> 
> 
>  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND 
> 
>              P 
> 10209 solr      20   0 6138468 452520  25740 R 99,9  5,4  29:43.45 java 
> -server -Xms1024m -Xmx1024m -XX:NewRatio=3 -XX:SurvivorRatio=4 
> -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 
> -XX:+UseConcMarkSweepGC -XX:ConcGCThreads=4 + 24 
> 10214 solr      20   0 6138468 452520  25740 R 99,9  5,4  28:42.91 java 
> -server -Xms1024m -Xmx1024m -XX:NewRatio=3 -XX:SurvivorRatio=4 
> -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 
> -XX:+UseConcMarkSweepGC -XX:ConcGCThreads=4 + 25
> 
> The solr server is installed on a Debian Stretch 9.8 (64bit) on Linux LXC 
> dedicated Container.
> 
> Some more server info:
> 
> root@solr:~ # java -version
> openjdk version "1.8.0_181"
> OpenJDK Runtime Environment (build 1.8.0_181-8u181-b13-2~deb9u1-b13)
> OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)
> 
> root@solr:~ # free -m
>              total        used        free      shared  buff/cache 
> available
> Mem:           8192         484        7675         701          31 7675
> Swap:          2048           0        2048
> 
> We also found something strange if we do an strace of the main process, we 
> 
> get lots of ongoing connection timeouts:
> 
> root@solr:~ # strace -F -p 4136
> strace: Process 4136 attached with 48 threads
> strace: [ Process PID=11089 runs in x32 mode. ]
> [pid  4937] epoll_wait(139,  <unfinished ...>
> [pid  4936] restart_syscall(<... resuming interrupted futex ...> 
> <unfinished ...>
> [pid  4909] restart_syscall(<... resuming interrupted futex ...> 
> <unfinished ...>
> [pid  4618] epoll_wait(136,  <unfinished ...>
> [pid  4576] futex(0x7ff61ce66474, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished 
> ...>
> [pid  4279] futex(0x7ff61ce62b34, FUTEX_WAIT_PRIVATE, 2203, NULL 
> <unfinished ...>
> [pid  4244] restart_syscall(<... resuming interrupted futex ...> 
> <unfinished ...>
> [pid  4227] futex(0x7ff56c71ae14, FUTEX_WAIT_PRIVATE, 2237, NULL 
> <unfinished ...>
> [pid  4243] restart_syscall(<... resuming interrupted futex ...> 
> <unfinished ...>
> [pid  4228] futex(0x7ff5608331a4, FUTEX_WAIT_PRIVATE, 2237, NULL 
> <unfinished ...>
> [pid  4208] futex(0x7ff61ce63e54, FUTEX_WAIT_PRIVATE, 5, NULL <unfinished 
> ...>
> [pid  4205] restart_syscall(<... resuming interrupted futex ...> 
> <unfinished ...>
> [pid  4204] restart_syscall(<... resuming interrupted futex ...> 
> <unfinished ...>
> [pid  4196] restart_syscall(<... resuming interrupted futex ...> 
> <unfinished ...>
> [pid  4195] restart_syscall(<... resuming interrupted futex ...> 
> <unfinished ...>
> [pid  4194] restart_syscall(<... resuming interrupted futex ...> 
> <unfinished ...>
> [pid  4193] restart_syscall(<... resuming interrupted futex ...> 
> <unfinished ...>
> [pid  4187] restart_syscall(<... resuming interrupted restart_syscall ...> 
> 
> <unfinished ...>
> [pid  4180] restart_syscall(<... resuming interrupted futex ...> 
> <unfinished ...>
> [pid  4179] restart_syscall(<... resuming interrupted futex ...> 
> <unfinished ...>
> [pid  4177] restart_syscall(<... resuming interrupted futex ...> 
> <unfinished ...>
> [pid  4174] accept(133,  <unfinished ...>
> [pid  4173] restart_syscall(<... resuming interrupted futex ...> 
> <unfinished ...>
> [pid  4172] restart_syscall(<... resuming interrupted futex ...> 
> <unfinished ...>
> [pid  4171] restart_syscall(<... resuming interrupted restart_syscall ...> 
> 
> <unfinished ...>
> [pid  4165] restart_syscall(<... resuming interrupted futex ...> 
> <unfinished ...>
> [pid  4164] futex(0x7ff61c1f5054, FUTEX_WAIT_PRIVATE, 3, NULL <unfinished 
> ...>
> [pid  4163] restart_syscall(<... resuming interrupted futex ...> 
> <unfinished ...>
> [pid  4162] restart_syscall(<... resuming interrupted futex ...> 
> <unfinished ...>
> [pid  4161] restart_syscall(<... resuming interrupted futex ...> 
> <unfinished ...>
> [pid  4160] futex(0x7ff623d52c20, 
> FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, 0xffffffff 
> <unfinished ...>
> [pid  4159] futex(0x7ff61c1e9d54, FUTEX_WAIT_PRIVATE, 7, NULL <unfinished 
> ...>
> [pid  4158] futex(0x7ff61c1b7f54, FUTEX_WAIT_PRIVATE, 15, NULL <unfinished 
> 
> ...>
> [pid  4157] futex(0x7ff61c1b5554, FUTEX_WAIT_PRIVATE, 19, NULL <unfinished 
> 
> ...>
> [pid  4156] restart_syscall(<... resuming interrupted futex ...> 
> <unfinished ...>
> [pid  4155] restart_syscall(<... resuming interrupted futex ...> 
> <unfinished ...>
> [pid  4153] futex(0x7ff61c06c754, FUTEX_WAIT_PRIVATE, 7, NULL <unfinished 
> ...>
> [pid  4152] futex(0x7ff61c06ab54, FUTEX_WAIT_PRIVATE, 3, NULL <unfinished 
> ...>
> [pid  4151] futex(0x7ff61c068f54, FUTEX_WAIT_PRIVATE, 7, NULL <unfinished 
> ...>
> [pid  4150] futex(0x7ff61c067354, FUTEX_WAIT_PRIVATE, 7, NULL <unfinished 
> ...>
> [pid  4148] futex(0x7ff61c024a54, FUTEX_WAIT_PRIVATE, 403, NULL 
> <unfinished ...>
> [pid  4165] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection 
> timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, 
> {tv_sec=32564856, tv_nsec=849859736}, 0xffffffff <unfinished ...>
> [pid  4147] futex(0x7ff61c022e54, FUTEX_WAIT_PRIVATE, 415, NULL 
> <unfinished ...>
> [pid  4146] futex(0x7ff61c021254, FUTEX_WAIT_PRIVATE, 397, NULL 
> <unfinished ...>
> [pid  4145] futex(0x7ff61c01f654, FUTEX_WAIT_PRIVATE, 405, NULL 
> <unfinished ...>
> [pid  4144] futex(0x7ff61c00e354, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished 
> ...>
> [pid  4136] futex(0x7ff624b729d0, FUTEX_WAIT, 4144, NULL <unfinished ...>
> [pid  4165] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed 
> out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, 
> {tv_sec=32564856, tv_nsec=900162344}, 0xffffffff) = -1 ETIMEDOUT 
> (Connection timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, 
> {tv_sec=32564856, tv_nsec=950365105}, 0xffffffff) = -1 ETIMEDOUT 
> (Connection timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, 
> {tv_sec=32564857, tv_nsec=586325}, 0xffffffff) = -1 ETIMEDOUT (Connection 
> timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, 
> {tv_sec=32564857, tv_nsec=50791977}, 0xffffffff) = -1 ETIMEDOUT 
> (Connection timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, 
> {tv_sec=32564857, tv_nsec=100997890}, 0xffffffff) = -1 ETIMEDOUT 
> (Connection timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, 
> {tv_sec=32564857, tv_nsec=151206817}, 0xffffffff) = -1 ETIMEDOUT 
> (Connection timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, 
> {tv_sec=32564857, tv_nsec=201402531}, 0xffffffff) = -1 ETIMEDOUT 
> (Connection timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, 
> {tv_sec=32564857, tv_nsec=251616284}, 0xffffffff) = -1 ETIMEDOUT 
> (Connection timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, 
> {tv_sec=32564857, tv_nsec=301813556}, 0xffffffff) = -1 ETIMEDOUT 
> (Connection timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, 
> {tv_sec=32564857, tv_nsec=352036802}, 0xffffffff) = -1 ETIMEDOUT 
> (Connection timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, 
> {tv_sec=32564857, tv_nsec=402239182}, 0xffffffff) = -1 ETIMEDOUT 
> (Connection timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, 
> {tv_sec=32564857, tv_nsec=452439835}, 0xffffffff) = -1 ETIMEDOUT 
> (Connection timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, 
> {tv_sec=32564857, tv_nsec=502635489}, 0xffffffff) = -1 ETIMEDOUT 
> (Connection timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, 
> {tv_sec=32564857, tv_nsec=552844020}, 0xffffffff <unfinished ...>
> [pid  4156] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection 
> timed out)
> [pid  4156] futex(0x7ff61c1aba28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4156] futex(0x7ff61c1aba54, FUTEX_WAIT_BITSET_PRIVATE, 1, 
> {tv_sec=32564858, tv_nsec=506449064}, 0xffffffff <unfinished ...>
> [pid  4165] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed 
> out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, 
> {tv_sec=32564857, tv_nsec=603013734}, 0xffffffff) = -1 ETIMEDOUT 
> (Connection timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, 
> {tv_sec=32564857, tv_nsec=653149664}, 0xffffffff^Cstrace: Process 4136 
> detached
> strace: Process 4144 detached
> strace: Process 4145 detached
> strace: Process 4146 detached
> strace: Process 4147 detached
> strace: Process 4148 detached
> strace: Process 4150 detached
> strace: Process 4151 detached
> strace: Process 4152 detached
> strace: Process 4153 detached
> ....
> 
> 
> Could you help us to determine what's wrong with our setup?
> 
> Thank you very much,
> 
> Kind regards
> Lukas Weiss
> 


Mime
View raw message