lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mmb1234 <>
Subject Re: Hard commits blocked | non-solrcloud v6.6.2
Date Sun, 11 Feb 2018 06:58:03 GMT
Hi Shawn, Erik

> updates should slow down but not deadlock. 
The net effect is the same. As the CLOSE_WAITs increase, jvm ultimately
stops accepting new socket requests, at which point `kill <solrpid>` is the
only option. 

This means if replication handler is invoked which sets the deletion policy,
the threads blocked rises even faster and system fails even faster.

Each solr POST is a blocking call, hence the CLOSE_WAITs. Also the POST gzip
is an json array of 100 json objects (1 json doc = 1 solr doc).

All custom AbstractSolrEventListener listeners were disabled to not process
any post commit events. Those threads are in WAITING state, which is ok.

I then ran /solr/58f449cec94a2c75-core-256/admin/luke at 10:30pm PST

It showed "lastModified: 2018-02-11T04:46:54.540Z" indicating commit blocked
for about 2 hours.
Hard commit is set as 10secs in solrconfig.xml

Other cores are also blocked for a while.

Thread dump and top output are from that condition are at

netstat CLOSE_WAIT are correlated with DirectUpdateHandler2 /
UpdateRequestProcessor.processAdd() requests.

solr [ /tmp ]$ sudo netstat -ptan | awk '{print $6 " " $7 }' | sort | uniq
-c; TZ=PST8PDT date;
   7728 CLOSE_WAIT -
      1 FIN_WAIT2 -
      1 Foreign Address
      6 LISTEN -
     36 TIME_WAIT -
      1 established)
Sat Feb 10 22:27:07 PST 2018 shows lots 6,700 threads in TIMED_WAIT shows 6584 threads with this stack
at java.lang.Object.wait(Native Method)

Only `top` available on Photon OS is
Those screenshots are attached.

Sent from:

View raw message