lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject RE: Solr node crashes while indexing - Too many open files
Date Thu, 30 Jun 2016 11:13:45 GMT
Yes, that is quite normal for a busy search engine, especially for cloud environments. We always
start by increasing it to 64k minimum when provisioning machines.
Markus
 
-----Original message-----
> From:Mads Tomasgård Bjørgan <mtb@dips.no>
> Sent: Thursday 30th June 2016 13:05
> To: solr-user@lucene.apache.org
> Subject: RE: Solr node crashes while indexing - Too many open files
> 
> That's true, but I was hoping there would be another way to solve this issue as it's
not considered preferable in our situation.
> 
> Is it normal behavior for Solr to open over 4000 files without closing them properly?
Is it for example possible to adjust autoCommit-settings I solrconfig.xml for forcing Solr
to close the files?
> 
> Any help is appreciated :-)
> 
> -----Original Message-----
> From: Markus Jelsma [mailto:markus.jelsma@openindex.io] 
> Sent: torsdag 30. juni 2016 11.41
> To: solr-user@lucene.apache.org
> Subject: RE: Solr node crashes while indexing - Too many open files
> 
> Mads, some distributions require different steps for increasing max_open_files. Check
how it works vor CentOS specifically.
> 
> Markus
> 
>  
>  
> -----Original message-----
> > From:Mads Tomasgård Bjørgan <mtb@dips.no>
> > Sent: Thursday 30th June 2016 10:52
> > To: solr-user@lucene.apache.org
> > Subject: Solr node crashes while indexing - Too many open files
> > 
> > Hello,
> > We're indexing a large set of files using Solr 6.1.0, running a SolrCloud by utilizing
ZooKeeper 3.4.8.
> > 
> > We have two ensembles - and both clusters are running on three of their own respective
VMs (CentOS 7). We first thought the error was due to CDCR - as we were trying to index a
large amount of documents which had to be replicated to the target cluster. However, we got
the same error even after turning of CDCR - which indicates CDCR wasn't the problem after
all.
> > 
> > After indexing between 20 000 to 35 000 documents to the source cluster does the
File Descriptor Count reach 4096 for one of the solr-nodes - and the respective node crashes.
The count grows quite linearly as time goes. The remaining 2 nodes in the cluster is not affected
at all, and their logs had no relevant posts.  We found the following errors for the crashing
node in its log:
> > 
> > 2016-06-30 08:23:12.459 ERROR (updateExecutor-2-thread-22-processing-https:////10.0.106.168:443//solr//DIPS_shard3_replica1
x:DIPS_shard1_replica1 r:core_node1 n:10.0.106.115:443_solr s:shard1 c:DIPS) [c:DIPS s:shard1
r:core_node1 x:DIPS_shard1_replica1] o.a.s.u.StreamingSolrClients error
> > java.net.SocketException: Too many open files
> >                 (...)
> > 2016-06-30 08:23:12.460 ERROR (updateExecutor-2-thread-22-processing-https:////10.0.106.168:443//solr//DIPS_shard3_replica1
x:DIPS_shard1_replica1 r:core_node1 n:10.0.106.115:443_solr s:shard1 c:DIPS) [c:DIPS s:shard1
r:core_node1 x:DIPS_shard1_replica1] o.a.s.u.StreamingSolrClients error
> > java.net.SocketException: Too many open files
> >                 (...)
> > 2016-06-30 08:23:12.461 ERROR (qtp314337396-18) [c:DIPS s:shard1 r:core_node1 x:DIPS_shard1_replica1]
o.a.s.h.RequestHandlerBase org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
2 Async exceptions during distributed update:
> > Too many open files
> > Too many open files
> >                 (...)
> > 2016-06-30 08:23:12.461 INFO  (qtp314337396-18) [c:DIPS s:shard1 r:core_node1 x:DIPS_shard1_replica1]
o.a.s.c.S.Request [DIPS_shard1_replica1]  webapp=/solr path=/update params={version=2.2} status=-1
QTime=5
> > 2016-06-30 08:23:12.461 ERROR (qtp314337396-18) [c:DIPS s:shard1 r:core_node1 x:DIPS_shard1_replica1]
o.a.s.s.HttpSolrCall null:org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
2 Async exceptions during distributed update:
> > Too many open files
> > Too many open files
> >                 (....)
> > 
> > 2016-06-30 08:23:12.461 WARN  (qtp314337396-18) [c:DIPS s:shard1 r:core_node1 x:DIPS_shard1_replica1]
o.a.s.s.HttpSolrCall invalid return code: -1
> > 2016-06-30 08:23:38.108 INFO  (qtp314337396-20) [c:DIPS s:shard1 r:core_node1 x:DIPS_shard1_replica1]
o.a.s.c.S.Request [DIPS_shard1_replica1]  webapp=/solr path=/select params={df=_text_&distrib=false&fl=id&fl=score&shards.purpose=4&start=0&fsv=true&shard.url=https://10.0.106.115:443/solr/DIPS_shard1_replica1/&rows=10&version=2&q=*:*&NOW=1467275018057&isShard=true&wt=javabin&_=1467275017220}
hits=30218 status=0 QTime=1
> > 
> > Running netstat -n -p on the VM that yields the exceptions reveals that there is
at least 1 800 TCP connections (not counted how many - the netstat command filled the entire
PuTTY window yielding 2 000 lines) waiting to be closed:
> > tcp6      70      0 10.0.106.115:34531      10.0.106.114:443        CLOSE_WAIT 
21658/java
> > We're running the SolrCloud on 443, and the IP's belong to the VMs. We also tried
adjusting the ulimit for the machine to 100 000 - without any results..
> > 
> > Greetings,
> > Mads
> > 
> 

Mime
View raw message