lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Solr 4.2 Cloud Replication Replica has higher version than Master?
Date Wed, 03 Apr 2013 19:14:55 GMT
If you don't specify numShards after 4.1, you get an implicit doc router and it's up to you
to distribute updates. In the past, partitioning was done on the fly - but for shard splitting
and perhaps other features, we now divvy up the hash range up front based on numShards and
store it in ZooKeeper. No numShards is now how you take complete control of updates yourself.

- Mark

On Apr 3, 2013, at 2:57 PM, Jamie Johnson <jej2003@gmail.com> wrote:

> The router says "implicit".  I did start from a blank zk state but perhaps
> I missed one of the ZkCLI commands?  One of my shards from the
> clusterstate.json is shown below.  What is the process that should be done
> to bootstrap a cluster other than the ZkCLI commands I listed above?  My
> process right now is run those ZkCLI commands and then start solr on all of
> the instances with a command like this
> 
> java -server -Dshard=shard5 -DcoreName=shard5-core1
> -Dsolr.data.dir=/solr/data/shard5-core1 -Dcollection.configName=solr-conf
> -Dcollection=collection1 -DzkHost=so-zoo1:2181,so-zoo2:2181,so-zoo3:2181
> -Djetty.port=7575 -DhostPort=7575 -jar start.jar
> 
> I feel like maybe I'm missing a step.
> 
> "shard5":{
>        "state":"active",
>        "replicas":{
>          "10.38.33.16:7575_solr_shard5-core1":{
>            "shard":"shard5",
>            "state":"active",
>            "core":"shard5-core1",
>            "collection":"collection1",
>            "node_name":"10.38.33.16:7575_solr",
>            "base_url":"http://10.38.33.16:7575/solr",
>            "leader":"true"},
>          "10.38.33.17:7577_solr_shard5-core2":{
>            "shard":"shard5",
>            "state":"recovering",
>            "core":"shard5-core2",
>            "collection":"collection1",
>            "node_name":"10.38.33.17:7577_solr",
>            "base_url":"http://10.38.33.17:7577/solr"}}}
> 
> 
> On Wed, Apr 3, 2013 at 2:40 PM, Mark Miller <markrmiller@gmail.com> wrote:
> 
>> It should be part of your clusterstate.json. Some users have reported
>> trouble upgrading a previous zk install when this change came. I
>> recommended manually updating the clusterstate.json to have the right info,
>> and that seemed to work. Otherwise, I guess you have to start from a clean
>> zk state.
>> 
>> If you don't have that range information, I think there will be trouble.
>> Do you have an router type defined in the clusterstate.json?
>> 
>> - Mark
>> 
>> On Apr 3, 2013, at 2:24 PM, Jamie Johnson <jej2003@gmail.com> wrote:
>> 
>>> Where is this information stored in ZK?  I don't see it in the cluster
>>> state (or perhaps I don't understand it ;) ).
>>> 
>>> Perhaps something with my process is broken.  What I do when I start from
>>> scratch is the following
>>> 
>>> ZkCLI -cmd upconfig ...
>>> ZkCLI -cmd linkconfig ....
>>> 
>>> but I don't ever explicitly create the collection.  What should the steps
>>> from scratch be?  I am moving from an unreleased snapshot of 4.0 so I
>> never
>>> did that previously either so perhaps I did create the collection in one
>> of
>>> my steps to get this working but have forgotten it along the way.
>>> 
>>> 
>>> On Wed, Apr 3, 2013 at 2:16 PM, Mark Miller <markrmiller@gmail.com>
>> wrote:
>>> 
>>>> Thanks for digging Jamie. In 4.2, hash ranges are assigned up front
>> when a
>>>> collection is created - each shard gets a range, which is stored in
>>>> zookeeper. You should not be able to end up with the same id on
>> different
>>>> shards - something very odd going on.
>>>> 
>>>> Hopefully I'll have some time to try and help you reproduce. Ideally we
>>>> can capture it in a test case.
>>>> 
>>>> - Mark
>>>> 
>>>> On Apr 3, 2013, at 1:13 PM, Jamie Johnson <jej2003@gmail.com> wrote:
>>>> 
>>>>> no, my thought was wrong, it appears that even with the parameter set
I
>>>> am
>>>>> seeing this behavior.  I've been able to duplicate it on 4.2.0 by
>>>> indexing
>>>>> 100,000 documents on 10 threads (10,000 each) when I get to 400,000 or
>>>> so.
>>>>> I will try this on 4.2.1. to see if I see the same behavior
>>>>> 
>>>>> 
>>>>> On Wed, Apr 3, 2013 at 12:37 PM, Jamie Johnson <jej2003@gmail.com>
>>>> wrote:
>>>>> 
>>>>>> Since I don't have that many items in my index I exported all of
the
>>>> keys
>>>>>> for each shard and wrote a simple java program that checks for
>>>> duplicates.
>>>>>> I found some duplicate keys on different shards, a grep of the files
>> for
>>>>>> the keys found does indicate that they made it to the wrong places.
>> If
>>>> you
>>>>>> notice documents with the same ID are on shard 3 and shard 5.  Is
it
>>>>>> possible that the hash is being calculated taking into account only
>> the
>>>>>> "live" nodes?  I know that we don't specify the numShards param @
>>>> startup
>>>>>> so could this be what is happening?
>>>>>> 
>>>>>> grep -c "7cd1a717-3d94-4f5d-bcb1-9d8a95ca78de" *
>>>>>> shard1-core1:0
>>>>>> shard1-core2:0
>>>>>> shard2-core1:0
>>>>>> shard2-core2:0
>>>>>> shard3-core1:1
>>>>>> shard3-core2:1
>>>>>> shard4-core1:0
>>>>>> shard4-core2:0
>>>>>> shard5-core1:1
>>>>>> shard5-core2:1
>>>>>> shard6-core1:0
>>>>>> shard6-core2:0
>>>>>> 
>>>>>> 
>>>>>> On Wed, Apr 3, 2013 at 10:42 AM, Jamie Johnson <jej2003@gmail.com>
>>>> wrote:
>>>>>> 
>>>>>>> Something interesting that I'm noticing as well, I just indexed
>> 300,000
>>>>>>> items, and some how 300,020 ended up in the index.  I thought
>> perhaps I
>>>>>>> messed something up so I started the indexing again and indexed
>> another
>>>>>>> 400,000 and I see 400,064 docs.  Is there a good way to find
>> possibile
>>>>>>> duplicates?  I had tried to facet on key (our id field) but that
>> didn't
>>>>>>> give me anything with more than a count of 1.
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Apr 3, 2013 at 9:22 AM, Jamie Johnson <jej2003@gmail.com>
>>>> wrote:
>>>>>>> 
>>>>>>>> Ok, so clearing the transaction log allowed things to go
again.  I
>> am
>>>>>>>> going to clear the index and try to replicate the problem
on 4.2.0
>>>> and then
>>>>>>>> I'll try on 4.2.1
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Apr 3, 2013 at 8:21 AM, Mark Miller <markrmiller@gmail.com
>>>>> wrote:
>>>>>>>> 
>>>>>>>>> No, not that I know if, which is why I say we need to
get to the
>>>> bottom
>>>>>>>>> of it.
>>>>>>>>> 
>>>>>>>>> - Mark
>>>>>>>>> 
>>>>>>>>> On Apr 2, 2013, at 10:18 PM, Jamie Johnson <jej2003@gmail.com>
>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Mark
>>>>>>>>>> It's there a particular jira issue that you think
may address
>> this?
>>>> I
>>>>>>>>> read
>>>>>>>>>> through it quickly but didn't see one that jumped
out
>>>>>>>>>> On Apr 2, 2013 10:07 PM, "Jamie Johnson" <jej2003@gmail.com>
>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> I brought the bad one down and back up and it
did nothing.  I can
>>>>>>>>> clear
>>>>>>>>>>> the index and try4.2.1. I will save off the logs
and see if there
>>>> is
>>>>>>>>>>> anything else odd
>>>>>>>>>>> On Apr 2, 2013 9:13 PM, "Mark Miller" <markrmiller@gmail.com>
>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> It would appear it's a bug given what you
have said.
>>>>>>>>>>>> 
>>>>>>>>>>>> Any other exceptions would be useful. Might
be best to start
>>>>>>>>> tracking in
>>>>>>>>>>>> a JIRA issue as well.
>>>>>>>>>>>> 
>>>>>>>>>>>> To fix, I'd bring the behind node down and
back again.
>>>>>>>>>>>> 
>>>>>>>>>>>> Unfortunately, I'm pressed for time, but
we really need to get
>> to
>>>>>>>>> the
>>>>>>>>>>>> bottom of this and fix it, or determine if
it's fixed in 4.2.1
>>>>>>>>> (spreading
>>>>>>>>>>>> to mirrors now).
>>>>>>>>>>>> 
>>>>>>>>>>>> - Mark
>>>>>>>>>>>> 
>>>>>>>>>>>> On Apr 2, 2013, at 7:21 PM, Jamie Johnson
<jej2003@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Sorry I didn't ask the obvious question.
 Is there anything
>> else
>>>>>>>>> that I
>>>>>>>>>>>>> should be looking for here and is this
a bug?  I'd be happy to
>>>>>>>>> troll
>>>>>>>>>>>>> through the logs further if more information
is needed, just
>> let
>>>> me
>>>>>>>>>>>> know.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Also what is the most appropriate mechanism
to fix this.  Is it
>>>>>>>>>>>> required to
>>>>>>>>>>>>> kill the index that is out of sync and
let solr resync things?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:45 PM, Jamie
Johnson <
>> jej2003@gmail.com
>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> sorry for spamming here....
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> shard5-core2 is the instance we're
having issues with...
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException
>> log
>>>>>>>>>>>>>> SEVERE: shard update error StdNode:
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>> 
>> http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException
>>>>>>>>>>>> :
>>>>>>>>>>>>>> Server at
>> http://10.38.33.17:7577/solr/dsc-shard5-core2returned
>>>>>>>>> non
>>>>>>>>>>>> ok
>>>>>>>>>>>>>> status:503, message:Service Unavailable
>>>>>>>>>>>>>>    at
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>> 
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
>>>>>>>>>>>>>>    at
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>> 
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>>>>>>>>>>>>>>    at
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>> 
>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
>>>>>>>>>>>>>>    at
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>> 
>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
>>>>>>>>>>>>>>    at
>>>>>>>>>>>>>> 
>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>>>>>>>>>>>>>    at
>> java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>>>>>>>>>>>>>    at
>>>>>>>>>>>>>> 
>>>>>>>>> 
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>>>>>>>>>>>>>>    at
>>>>>>>>>>>>>> 
>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>>>>>>>>>>>>>    at
>> java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>>>>>>>>>>>>>    at
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>>>>>>>>>>>>    at
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>>>>>>>>>>>>    at java.lang.Thread.run(Thread.java:662)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:43 PM, Jamie
Johnson <
>>>> jej2003@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> here is another one that looks
interesting
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException
>> log
>>>>>>>>>>>>>>> SEVERE: org.apache.solr.common.SolrException:
ClusterState
>> says
>>>>>>>>> we are
>>>>>>>>>>>>>>> the leader, but locally we don't
think so
>>>>>>>>>>>>>>>    at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>> 
>> org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293)
>>>>>>>>>>>>>>>    at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>> 
>> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228)
>>>>>>>>>>>>>>>    at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>> 
>> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339)
>>>>>>>>>>>>>>>    at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>> 
>> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
>>>>>>>>>>>>>>>    at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>> 
>> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
>>>>>>>>>>>>>>>    at
>>>>>>>>>>>>>>> 
>>>> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
>>>>>>>>>>>>>>>    at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>> 
>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>>>>>>>>>>>>>>>    at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>> 
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>>>>>>>>>>>>>>>    at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>> 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>>>>>>>>>>>>>>>    at
>>>>>>>>> org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
>>>>>>>>>>>>>>>    at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>> 
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
>>>>>>>>>>>>>>>    at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>> 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:41 PM,
Jamie Johnson <
>>>> jej2003@gmail.com
>>>>>>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Looking at the master it
looks like at some point there were
>>>>>>>>> shards
>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> went down.  I am seeing things
like what is below.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> NFO: A cluster state change:
WatchedEvent
>> state:SyncConnected
>>>>>>>>>>>>>>>> type:NodeChildrenChanged
path:/live_nodes, has occurred -
>>>>>>>>>>>> updating... (live
>>>>>>>>>>>>>>>> nodes size: 12)
>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
>>>>>>>>> org.apache.solr.common.cloud.ZkStateReader$3
>>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>>> INFO: Updating live nodes...
(9)
>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
>>>>>>>>>>>>>>>> runLeaderProcess
>>>>>>>>>>>>>>>> INFO: Running the leader
process.
>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
>>>>>>>>>>>>>>>> shouldIBeLeader
>>>>>>>>>>>>>>>> INFO: Checking if I should
try and be the leader.
>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
>>>>>>>>>>>>>>>> shouldIBeLeader
>>>>>>>>>>>>>>>> INFO: My last published State
was Active, it's okay to be
>> the
>>>>>>>>> leader.
>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
>>>>>>>>>>>>>>>> runLeaderProcess
>>>>>>>>>>>>>>>> INFO: I may be the new leader
- try and sync
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:09
PM, Mark Miller <
>>>>>>>>> markrmiller@gmail.com
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I don't think the versions
you are thinking of apply here.
>>>>>>>>> Peersync
>>>>>>>>>>>>>>>>> does not look at that
- it looks at version numbers for
>>>>>>>>> updates in
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> transaction log - it
compares the last 100 of them on
>> leader
>>>>>>>>> and
>>>>>>>>>>>> replica.
>>>>>>>>>>>>>>>>> What it's saying is that
the replica seems to have versions
>>>>>>>>> that
>>>>>>>>>>>> the leader
>>>>>>>>>>>>>>>>> does not. Have you scanned
the logs for any interesting
>>>>>>>>> exceptions?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Did the leader change
during the heavy indexing? Did any zk
>>>>>>>>> session
>>>>>>>>>>>>>>>>> timeouts occur?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> - Mark
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 4:52
PM, Jamie Johnson <
>> jej2003@gmail.com
>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I am currently looking
at moving our Solr cluster to 4.2
>> and
>>>>>>>>>>>> noticed a
>>>>>>>>>>>>>>>>>> strange issue while
testing today.  Specifically the
>> replica
>>>>>>>>> has a
>>>>>>>>>>>>>>>>> higher
>>>>>>>>>>>>>>>>>> version than the
master which is causing the index to not
>>>>>>>>>>>> replicate.
>>>>>>>>>>>>>>>>>> Because of this the
replica has fewer documents than the
>>>>>>>>> master.
>>>>>>>>>>>> What
>>>>>>>>>>>>>>>>>> could cause this
and how can I resolve it short of taking
>>>>>>>>> down the
>>>>>>>>>>>>>>>>> index
>>>>>>>>>>>>>>>>>> and scping the right
version in?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> MASTER:
>>>>>>>>>>>>>>>>>> Last Modified:about
an hour ago
>>>>>>>>>>>>>>>>>> Num Docs:164880
>>>>>>>>>>>>>>>>>> Max Doc:164880
>>>>>>>>>>>>>>>>>> Deleted Docs:0
>>>>>>>>>>>>>>>>>> Version:2387
>>>>>>>>>>>>>>>>>> Segment Count:23
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> REPLICA:
>>>>>>>>>>>>>>>>>> Last Modified: about
an hour ago
>>>>>>>>>>>>>>>>>> Num Docs:164773
>>>>>>>>>>>>>>>>>> Max Doc:164773
>>>>>>>>>>>>>>>>>> Deleted Docs:0
>>>>>>>>>>>>>>>>>> Version:3001
>>>>>>>>>>>>>>>>>> Segment Count:30
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> in the replicas log
it says this:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> INFO: Creating new
http client,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>> 
>> config:maxConnectionsPerHost=20&maxConnections=10000&connTimeout=30000&socketTimeout=30000&retry=false
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06
PM org.apache.solr.update.PeerSync
>> sync
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2
>>>>>>>>>>>>>>>>>> url=http://10.38.33.17:7577/solrSTART
replicas=[
>>>>>>>>>>>>>>>>>> http://10.38.33.16:7575/solr/dsc-shard5-core1/]
>>>> nUpdates=100
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06
PM org.apache.solr.update.PeerSync
>>>>>>>>>>>> handleVersions
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2
url=
>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr
>>>>>>>>>>>>>>>>>> Received 100 versions
from
>>>>>>>>> 10.38.33.16:7575/solr/dsc-shard5-core1/
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06
PM org.apache.solr.update.PeerSync
>>>>>>>>>>>> handleVersions
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2
url=
>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr
 Our
>>>>>>>>>>>>>>>>>> versions are newer.
ourLowThreshold=1431233788792274944
>>>>>>>>>>>>>>>>>> otherHigh=1431233789440294912
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06
PM org.apache.solr.update.PeerSync
>> sync
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2
>>>>>>>>>>>>>>>>>> url=http://10.38.33.17:7577/solrDONE.
sync succeeded
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> which again seems
to point that it thinks it has a newer
>>>>>>>>> version of
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> index so it aborts.
 This happened while having 10 threads
>>>>>>>>> indexing
>>>>>>>>>>>>>>>>> 10,000
>>>>>>>>>>>>>>>>>> items writing to
a 6 shard (1 replica each) cluster.  Any
>>>>>>>>> thoughts
>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>> or what I should
look for would be appreciated.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 


Mime
View raw message