lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jamie Johnson <jej2...@gmail.com>
Subject Re: Solr 4.2 Cloud Replication Replica has higher version than Master?
Date Tue, 02 Apr 2013 23:21:48 GMT
Sorry I didn't ask the obvious question.  Is there anything else that I
should be looking for here and is this a bug?  I'd be happy to troll
through the logs further if more information is needed, just let me know.

Also what is the most appropriate mechanism to fix this.  Is it required to
kill the index that is out of sync and let solr resync things?


On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson <jej2003@gmail.com> wrote:

> sorry for spamming here....
>
> shard5-core2 is the instance we're having issues with...
>
> Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
> SEVERE: shard update error StdNode:
> http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException:
> Server at http://10.38.33.17:7577/solr/dsc-shard5-core2 returned non ok
> status:503, message:Service Unavailable
>         at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
>         at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>         at
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
>         at
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
>
>
> On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson <jej2003@gmail.com> wrote:
>
>> here is another one that looks interesting
>>
>> Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
>> SEVERE: org.apache.solr.common.SolrException: ClusterState says we are
>> the leader, but locally we don't think so
>>         at
>> org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293)
>>         at
>> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228)
>>         at
>> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339)
>>         at
>> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
>>         at
>> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
>>         at
>> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
>>         at
>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>>         at
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>>         at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
>>         at
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
>>         at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
>>
>>
>>
>> On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson <jej2003@gmail.com> wrote:
>>
>>> Looking at the master it looks like at some point there were shards that
>>> went down.  I am seeing things like what is below.
>>>
>>> NFO: A cluster state change: WatchedEvent state:SyncConnected
>>> type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live
>>> nodes size: 12)
>>> Apr 2, 2013 8:12:52 PM org.apache.solr.common.cloud.ZkStateReader$3
>>> process
>>> INFO: Updating live nodes... (9)
>>> Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
>>> runLeaderProcess
>>> INFO: Running the leader process.
>>> Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
>>> shouldIBeLeader
>>> INFO: Checking if I should try and be the leader.
>>> Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
>>> shouldIBeLeader
>>> INFO: My last published State was Active, it's okay to be the leader.
>>> Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
>>> runLeaderProcess
>>> INFO: I may be the new leader - try and sync
>>>
>>>
>>>
>>> On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller <markrmiller@gmail.com>wrote:
>>>
>>>> I don't think the versions you are thinking of apply here. Peersync
>>>> does not look at that - it looks at version numbers for updates in the
>>>> transaction log - it compares the last 100 of them on leader and replica.
>>>> What it's saying is that the replica seems to have versions that the leader
>>>> does not. Have you scanned the logs for any interesting exceptions?
>>>>
>>>> Did the leader change during the heavy indexing? Did any zk session
>>>> timeouts occur?
>>>>
>>>> - Mark
>>>>
>>>> On Apr 2, 2013, at 4:52 PM, Jamie Johnson <jej2003@gmail.com> wrote:
>>>>
>>>> > I am currently looking at moving our Solr cluster to 4.2 and noticed
a
>>>> > strange issue while testing today.  Specifically the replica has a
>>>> higher
>>>> > version than the master which is causing the index to not replicate.
>>>> > Because of this the replica has fewer documents than the master.  What
>>>> > could cause this and how can I resolve it short of taking down the
>>>> index
>>>> > and scping the right version in?
>>>> >
>>>> > MASTER:
>>>> > Last Modified:about an hour ago
>>>> > Num Docs:164880
>>>> > Max Doc:164880
>>>> > Deleted Docs:0
>>>> > Version:2387
>>>> > Segment Count:23
>>>> >
>>>> > REPLICA:
>>>> > Last Modified: about an hour ago
>>>> > Num Docs:164773
>>>> > Max Doc:164773
>>>> > Deleted Docs:0
>>>> > Version:3001
>>>> > Segment Count:30
>>>> >
>>>> > in the replicas log it says this:
>>>> >
>>>> > INFO: Creating new http client,
>>>> >
>>>> config:maxConnectionsPerHost=20&maxConnections=10000&connTimeout=30000&socketTimeout=30000&retry=false
>>>> >
>>>> > Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync
>>>> >
>>>> > INFO: PeerSync: core=dsc-shard5-core2
>>>> > url=http://10.38.33.17:7577/solrSTART replicas=[
>>>> > http://10.38.33.16:7575/solr/dsc-shard5-core1/] nUpdates=100
>>>> >
>>>> > Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions
>>>> >
>>>> > INFO: PeerSync: core=dsc-shard5-core2 url=
>>>> http://10.38.33.17:7577/solr
>>>> > Received 100 versions from 10.38.33.16:7575/solr/dsc-shard5-core1/
>>>> >
>>>> > Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions
>>>> >
>>>> > INFO: PeerSync: core=dsc-shard5-core2 url=
>>>> http://10.38.33.17:7577/solr  Our
>>>> > versions are newer. ourLowThreshold=1431233788792274944
>>>> > otherHigh=1431233789440294912
>>>> >
>>>> > Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync
>>>> >
>>>> > INFO: PeerSync: core=dsc-shard5-core2
>>>> > url=http://10.38.33.17:7577/solrDONE. sync succeeded
>>>> >
>>>> >
>>>> > which again seems to point that it thinks it has a newer version of
>>>> the
>>>> > index so it aborts.  This happened while having 10 threads indexing
>>>> 10,000
>>>> > items writing to a 6 shard (1 replica each) cluster.  Any thoughts on
>>>> this
>>>> > or what I should look for would be appreciated.
>>>>
>>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message