lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Veera Raghavan <veera.raghavan...@gmail.com>
Subject Re: Solr Cores going down in Solrcloud 4.3.1
Date Fri, 07 Mar 2014 23:15:27 GMT
I did more deep diving and found out the following exception while it tries
to replicate.

135531514-ERROR - 2014-03-07 23:08:35.454;
org.apache.solr.common.SolrException; SnapPull failed
:org.apache.lucene.store.AlreadyClosedException: Already closed
135531665- at
org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:336)
135531752- at
org.apache.solr.handler.ReplicationHandler.loadReplicationProperties(ReplicationHandler.java:806)
135531854- at
org.apache.solr.handler.SnapPuller.logReplicationTimeAndConfFiles(SnapPuller.java:522)
135531945- at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:464)


I opened the solrcloud and found that if  while ReplicationStrategy is
trying to open the index directory , it encounters this exception. I
searched the solr jira's and found  this issue
*https://issues.apache.org/jira/browse/SOLR-4960
<https://issues.apache.org/jira/browse/SOLR-4960>* closely related to mine
(but do not know for sure)

Can anyone familiar with the jira let me know if this issue will go away if
we upgrade to 4.4?

Thanks again
Nitin




On Fri, Mar 7, 2014 at 11:46 AM, Veera Raghavan <veera.raghavan.mp@gmail.com
> wrote:

> Forgot to attach the log during the recovery failed
>
> solr.log.129:1625677:ERROR - 2014-03-06 13:29:31.909;
> org.apache.solr.common.SolrException; Error while trying to
> recover:org.apache.solr.common.SolrException: Replication for recovery
> failed.
> solr.log.129-1625849- at
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:156)
> solr.log.129-1625929- at
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:409)
> solr.log.129-1626010- at
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223)
>
>
> solr.log.129-1626085-INFO  - 2014-03-06 13:29:31.910;
> org.apache.solr.update.UpdateLog; Dropping buffered updates
> FSUpdateLog{state=BUFFERING, tlog=tlog{file=/mnt/search/solr/
> testcollection_shard1_replica2/data/tlog/tlog.0000000000000000000
> refcount=1}}
>
> solr.log.129-1626353-ERROR - 2014-03-06 13:29:31.910;
> org.apache.solr.cloud.RecoveryStrategy; Recovery failed - trying again...
> (7) core=testcollection_shard1_replica2
>
>
> On Fri, Mar 7, 2014 at 11:24 AM, Veera Raghavan <
> veera.raghavan.mp@gmail.com> wrote:
>
>> Hi there
>>
>>   I have a 6 node solrcloud cluster with 50 collections. All collections
>> are sharded across all the 6 nodes. I am seeing a weird behavior where both
>> the replicas for a  shard go to down to go to a "recovering" state and
>> never come back (No specific corelation to writes or reads).
>>
>>  I manually am unloading and recreating the cores to band aid the problem
>>
>> In the solr logs I see this..
>>
>> org.apache.solr.servlet.SolrDispatchFilter; [admin] webapp=null
>> path=/admin/cores
>> params={coreNodeName=<ip>:8983_solr_testcollection_shard1_replica1&state=recovering&nodeName=<ip>:8983_solr&action=PREPRECOVERY&checkLive=true&core=solr_testcollection_shard1_replica2&wt=javabin&onlyIfLeader=true&version=2}
>> status=0 QTime=99
>>
>>
>> Have any of you seen this issue before? Is it a known bug that can be
>> fixed with an upgrade? Should i increase the zookeeper timeout may be?
>>
>>
>> Any pointers are much appreciated
>> Thanks
>> Veera
>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message