lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Solr 4.2 Cloud Replication Replica has higher version than Master?
Date Thu, 04 Apr 2013 00:28:10 GMT


On Apr 3, 2013, at 8:17 PM, Jamie Johnson <jej2003@gmail.com> wrote:

> I am not using the concurrent low pause garbage collector, I could look at
> switching, I'm assuming you're talking about adding -XX:+UseConcMarkSweepGC
> correct?

Right - if you don't do that, the default is almost always the throughput collector (I've only seen OSX buck this trend when apple handled java). That means stop the world garbage collections, so with larger heaps, that can be a fair amount of time that no threads can run. It's not that great for something as interactive as search generally is anyway, but it's always not that great when added to heavy load and a 15 sec session timeout between solr and zk.


The below is odd - a replica node is waiting for the leader to see it as recovering and live - live means it has created an ephemeral node for that Solr corecontainer in zk - it's very strange if that didn't happen, unless this happened during shutdown or something.

> 
> I also just had a shard go down and am seeing this in the log
> 
> SEVERE: org.apache.solr.common.SolrException: I was asked to wait on state
> down for 10.38.33.17:7576_solr but I still do not see the requested state.
> I see state: recovering live:false
>        at
> org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler.java:890)
>        at
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:186)
>        at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>        at
> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:591)
>        at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:192)
>        at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
>        at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
>        at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
>        at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>        at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
>        at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> 
> Nothing other than this in the log jumps out as interesting though.
> 
> 
> On Wed, Apr 3, 2013 at 7:47 PM, Mark Miller <markrmiller@gmail.com> wrote:
> 
>> This shouldn't be a problem though, if things are working as they are
>> supposed to. Another node should simply take over as the overseer and
>> continue processing the work queue. It's just best if you configure so that
>> session timeouts don't happen unless a node is really down. On the other
>> hand, it's nicer to detect that faster. Your tradeoff to make.
>> 
>> - Mark
>> 
>> On Apr 3, 2013, at 7:46 PM, Mark Miller <markrmiller@gmail.com> wrote:
>> 
>>> Yeah. Are you using the concurrent low pause garbage collector?
>>> 
>>> This means the overseer wasn't able to communicate with zk for 15
>> seconds - due to load or gc or whatever. If you can't resolve the root
>> cause of that, or the load just won't allow for it, next best thing you can
>> do is raise it to 30 seconds.
>>> 
>>> - Mark
>>> 
>>> On Apr 3, 2013, at 7:41 PM, Jamie Johnson <jej2003@gmail.com> wrote:
>>> 
>>>> I am occasionally seeing this in the log, is this just a timeout issue?
>>>> Should I be increasing the zk client timeout?
>>>> 
>>>> WARNING: Overseer cannot talk to ZK
>>>> Apr 3, 2013 11:14:25 PM
>>>> org.apache.solr.cloud.DistributedQueue$LatchChildWatcher process
>>>> INFO: Watcher fired on path: null state: Expired type None
>>>> Apr 3, 2013 11:14:25 PM
>> org.apache.solr.cloud.Overseer$ClusterStateUpdater
>>>> run
>>>> WARNING: Solr cannot talk to ZK, exiting Overseer main queue loop
>>>> org.apache.zookeeper.KeeperException$SessionExpiredException:
>>>> KeeperErrorCode = Session expired for /overseer/queue
>>>>      at
>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>>>>      at
>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>>>>      at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468)
>>>>      at
>>>> 
>> org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:236)
>>>>      at
>>>> 
>> org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:233)
>>>>      at
>>>> 
>> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65)
>>>>      at
>>>> 
>> org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:233)
>>>>      at
>>>> 
>> org.apache.solr.cloud.DistributedQueue.orderedChildren(DistributedQueue.java:89)
>>>>      at
>>>> 
>> org.apache.solr.cloud.DistributedQueue.element(DistributedQueue.java:131)
>>>>      at
>>>> org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:326)
>>>>      at
>>>> 
>> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:128)
>>>>      at java.lang.Thread.run(Thread.java:662)
>>>> 
>>>> 
>>>> 
>>>> On Wed, Apr 3, 2013 at 7:25 PM, Jamie Johnson <jej2003@gmail.com>
>> wrote:
>>>> 
>>>>> just an update, I'm at 1M records now with no issues.  This looks
>>>>> promising as to the cause of my issues, thanks for the help.  Is the
>>>>> routing method with numShards documented anywhere?  I know numShards is
>>>>> documented but I didn't know that the routing changed if you don't
>> specify
>>>>> it.
>>>>> 
>>>>> 
>>>>> On Wed, Apr 3, 2013 at 4:44 PM, Jamie Johnson <jej2003@gmail.com>
>> wrote:
>>>>> 
>>>>>> with these changes things are looking good, I'm up to 600,000
>> documents
>>>>>> without any issues as of right now.  I'll keep going and add more to
>> see if
>>>>>> I find anything.
>>>>>> 
>>>>>> 
>>>>>> On Wed, Apr 3, 2013 at 4:01 PM, Jamie Johnson <jej2003@gmail.com>
>> wrote:
>>>>>> 
>>>>>>> ok, so that's not a deal breaker for me.  I just changed it to match
>> the
>>>>>>> shards that are auto created and it looks like things are happy.
>> I'll go
>>>>>>> ahead and try my test to see if I can get things out of sync.
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Apr 3, 2013 at 3:56 PM, Mark Miller <markrmiller@gmail.com
>>> wrote:
>>>>>>> 
>>>>>>>> I had thought you could - but looking at the code recently, I don't
>>>>>>>> think you can anymore. I think that's a technical limitation more
>> than
>>>>>>>> anything though. When these changes were made, I think support for
>> that was
>>>>>>>> simply not added at the time.
>>>>>>>> 
>>>>>>>> I'm not sure exactly how straightforward it would be, but it seems
>>>>>>>> doable - as it is, the overseer will preallocate shards when first
>> creating
>>>>>>>> the collection - that's when they get named shard(n). There would
>> have to
>>>>>>>> be logic to replace shard(n) with the custom shard name when the
>> core
>>>>>>>> actually registers.
>>>>>>>> 
>>>>>>>> - Mark
>>>>>>>> 
>>>>>>>> On Apr 3, 2013, at 3:42 PM, Jamie Johnson <jej2003@gmail.com>
>> wrote:
>>>>>>>> 
>>>>>>>>> answered my own question, it now says compositeId.  What is
>>>>>>>> problematic
>>>>>>>>> though is that in addition to my shards (which are say
>> jamie-shard1)
>>>>>>>> I see
>>>>>>>>> the solr created shards (shard1).  I assume that these were created
>>>>>>>> because
>>>>>>>>> of the numShards param.  Is there no way to specify the names of
>> these
>>>>>>>>> shards?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Apr 3, 2013 at 3:25 PM, Jamie Johnson <jej2003@gmail.com>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> ah interesting....so I need to specify num shards, blow out zk and
>>>>>>>> then
>>>>>>>>>> try this again to see if things work properly now.  What is really
>>>>>>>> strange
>>>>>>>>>> is that for the most part things have worked right and on 4.2.1 I
>>>>>>>> have
>>>>>>>>>> 600,000 items indexed with no duplicates.  In any event I will
>>>>>>>> specify num
>>>>>>>>>> shards clear out zk and begin again.  If this works properly what
>>>>>>>> should
>>>>>>>>>> the router type be?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, Apr 3, 2013 at 3:14 PM, Mark Miller <
>> markrmiller@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> If you don't specify numShards after 4.1, you get an implicit doc
>>>>>>>> router
>>>>>>>>>>> and it's up to you to distribute updates. In the past,
>> partitioning
>>>>>>>> was
>>>>>>>>>>> done on the fly - but for shard splitting and perhaps other
>>>>>>>> features, we
>>>>>>>>>>> now divvy up the hash range up front based on numShards and store
>>>>>>>> it in
>>>>>>>>>>> ZooKeeper. No numShards is now how you take complete control of
>>>>>>>> updates
>>>>>>>>>>> yourself.
>>>>>>>>>>> 
>>>>>>>>>>> - Mark
>>>>>>>>>>> 
>>>>>>>>>>> On Apr 3, 2013, at 2:57 PM, Jamie Johnson <jej2003@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> The router says "implicit".  I did start from a blank zk state
>> but
>>>>>>>>>>> perhaps
>>>>>>>>>>>> I missed one of the ZkCLI commands?  One of my shards from the
>>>>>>>>>>>> clusterstate.json is shown below.  What is the process that
>> should
>>>>>>>> be
>>>>>>>>>>> done
>>>>>>>>>>>> to bootstrap a cluster other than the ZkCLI commands I listed
>>>>>>>> above?  My
>>>>>>>>>>>> process right now is run those ZkCLI commands and then start
>> solr
>>>>>>>> on
>>>>>>>>>>> all of
>>>>>>>>>>>> the instances with a command like this
>>>>>>>>>>>> 
>>>>>>>>>>>> java -server -Dshard=shard5 -DcoreName=shard5-core1
>>>>>>>>>>>> -Dsolr.data.dir=/solr/data/shard5-core1
>>>>>>>>>>> -Dcollection.configName=solr-conf
>>>>>>>>>>>> -Dcollection=collection1
>>>>>>>> -DzkHost=so-zoo1:2181,so-zoo2:2181,so-zoo3:2181
>>>>>>>>>>>> -Djetty.port=7575 -DhostPort=7575 -jar start.jar
>>>>>>>>>>>> 
>>>>>>>>>>>> I feel like maybe I'm missing a step.
>>>>>>>>>>>> 
>>>>>>>>>>>> "shard5":{
>>>>>>>>>>>>    "state":"active",
>>>>>>>>>>>>    "replicas":{
>>>>>>>>>>>>      "10.38.33.16:7575_solr_shard5-core1":{
>>>>>>>>>>>>        "shard":"shard5",
>>>>>>>>>>>>        "state":"active",
>>>>>>>>>>>>        "core":"shard5-core1",
>>>>>>>>>>>>        "collection":"collection1",
>>>>>>>>>>>>        "node_name":"10.38.33.16:7575_solr",
>>>>>>>>>>>>        "base_url":"http://10.38.33.16:7575/solr",
>>>>>>>>>>>>        "leader":"true"},
>>>>>>>>>>>>      "10.38.33.17:7577_solr_shard5-core2":{
>>>>>>>>>>>>        "shard":"shard5",
>>>>>>>>>>>>        "state":"recovering",
>>>>>>>>>>>>        "core":"shard5-core2",
>>>>>>>>>>>>        "collection":"collection1",
>>>>>>>>>>>>        "node_name":"10.38.33.17:7577_solr",
>>>>>>>>>>>>        "base_url":"http://10.38.33.17:7577/solr"}}}
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Apr 3, 2013 at 2:40 PM, Mark Miller <
>> markrmiller@gmail.com
>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> It should be part of your clusterstate.json. Some users have
>>>>>>>> reported
>>>>>>>>>>>>> trouble upgrading a previous zk install when this change came.
>> I
>>>>>>>>>>>>> recommended manually updating the clusterstate.json to have the
>>>>>>>> right
>>>>>>>>>>> info,
>>>>>>>>>>>>> and that seemed to work. Otherwise, I guess you have to start
>>>>>>>> from a
>>>>>>>>>>> clean
>>>>>>>>>>>>> zk state.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> If you don't have that range information, I think there will be
>>>>>>>>>>> trouble.
>>>>>>>>>>>>> Do you have an router type defined in the clusterstate.json?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> - Mark
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Apr 3, 2013, at 2:24 PM, Jamie Johnson <jej2003@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Where is this information stored in ZK?  I don't see it in the
>>>>>>>> cluster
>>>>>>>>>>>>>> state (or perhaps I don't understand it ;) ).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Perhaps something with my process is broken.  What I do when I
>>>>>>>> start
>>>>>>>>>>> from
>>>>>>>>>>>>>> scratch is the following
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> ZkCLI -cmd upconfig ...
>>>>>>>>>>>>>> ZkCLI -cmd linkconfig ....
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> but I don't ever explicitly create the collection.  What
>> should
>>>>>>>> the
>>>>>>>>>>> steps
>>>>>>>>>>>>>> from scratch be?  I am moving from an unreleased snapshot of
>> 4.0
>>>>>>>> so I
>>>>>>>>>>>>> never
>>>>>>>>>>>>>> did that previously either so perhaps I did create the
>>>>>>>> collection in
>>>>>>>>>>> one
>>>>>>>>>>>>> of
>>>>>>>>>>>>>> my steps to get this working but have forgotten it along the
>> way.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 2:16 PM, Mark Miller <
>>>>>>>> markrmiller@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks for digging Jamie. In 4.2, hash ranges are assigned up
>>>>>>>> front
>>>>>>>>>>>>> when a
>>>>>>>>>>>>>>> collection is created - each shard gets a range, which is
>>>>>>>> stored in
>>>>>>>>>>>>>>> zookeeper. You should not be able to end up with the same id
>> on
>>>>>>>>>>>>> different
>>>>>>>>>>>>>>> shards - something very odd going on.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hopefully I'll have some time to try and help you reproduce.
>>>>>>>> Ideally
>>>>>>>>>>> we
>>>>>>>>>>>>>>> can capture it in a test case.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> - Mark
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Apr 3, 2013, at 1:13 PM, Jamie Johnson <jej2003@gmail.com
>>> 
>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> no, my thought was wrong, it appears that even with the
>>>>>>>> parameter
>>>>>>>>>>> set I
>>>>>>>>>>>>>>> am
>>>>>>>>>>>>>>>> seeing this behavior.  I've been able to duplicate it on
>> 4.2.0
>>>>>>>> by
>>>>>>>>>>>>>>> indexing
>>>>>>>>>>>>>>>> 100,000 documents on 10 threads (10,000 each) when I get to
>>>>>>>> 400,000
>>>>>>>>>>> or
>>>>>>>>>>>>>>> so.
>>>>>>>>>>>>>>>> I will try this on 4.2.1. to see if I see the same behavior
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 12:37 PM, Jamie Johnson <
>>>>>>>> jej2003@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Since I don't have that many items in my index I exported
>> all
>>>>>>>> of
>>>>>>>>>>> the
>>>>>>>>>>>>>>> keys
>>>>>>>>>>>>>>>>> for each shard and wrote a simple java program that checks
>> for
>>>>>>>>>>>>>>> duplicates.
>>>>>>>>>>>>>>>>> I found some duplicate keys on different shards, a grep of
>> the
>>>>>>>>>>> files
>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>> the keys found does indicate that they made it to the wrong
>>>>>>>> places.
>>>>>>>>>>>>> If
>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>> notice documents with the same ID are on shard 3 and shard
>> 5.
>>>>>>>> Is
>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> possible that the hash is being calculated taking into
>>>>>>>> account only
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> "live" nodes?  I know that we don't specify the numShards
>>>>>>>> param @
>>>>>>>>>>>>>>> startup
>>>>>>>>>>>>>>>>> so could this be what is happening?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> grep -c "7cd1a717-3d94-4f5d-bcb1-9d8a95ca78de" *
>>>>>>>>>>>>>>>>> shard1-core1:0
>>>>>>>>>>>>>>>>> shard1-core2:0
>>>>>>>>>>>>>>>>> shard2-core1:0
>>>>>>>>>>>>>>>>> shard2-core2:0
>>>>>>>>>>>>>>>>> shard3-core1:1
>>>>>>>>>>>>>>>>> shard3-core2:1
>>>>>>>>>>>>>>>>> shard4-core1:0
>>>>>>>>>>>>>>>>> shard4-core2:0
>>>>>>>>>>>>>>>>> shard5-core1:1
>>>>>>>>>>>>>>>>> shard5-core2:1
>>>>>>>>>>>>>>>>> shard6-core1:0
>>>>>>>>>>>>>>>>> shard6-core2:0
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 10:42 AM, Jamie Johnson <
>>>>>>>> jej2003@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Something interesting that I'm noticing as well, I just
>>>>>>>> indexed
>>>>>>>>>>>>> 300,000
>>>>>>>>>>>>>>>>>> items, and some how 300,020 ended up in the index.  I
>> thought
>>>>>>>>>>>>> perhaps I
>>>>>>>>>>>>>>>>>> messed something up so I started the indexing again and
>>>>>>>> indexed
>>>>>>>>>>>>> another
>>>>>>>>>>>>>>>>>> 400,000 and I see 400,064 docs.  Is there a good way to
>> find
>>>>>>>>>>>>> possibile
>>>>>>>>>>>>>>>>>> duplicates?  I had tried to facet on key (our id field)
>> but
>>>>>>>> that
>>>>>>>>>>>>> didn't
>>>>>>>>>>>>>>>>>> give me anything with more than a count of 1.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 9:22 AM, Jamie Johnson <
>>>>>>>> jej2003@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Ok, so clearing the transaction log allowed things to go
>>>>>>>> again.
>>>>>>>>>>> I
>>>>>>>>>>>>> am
>>>>>>>>>>>>>>>>>>> going to clear the index and try to replicate the
>> problem on
>>>>>>>>>>> 4.2.0
>>>>>>>>>>>>>>> and then
>>>>>>>>>>>>>>>>>>> I'll try on 4.2.1
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 8:21 AM, Mark Miller <
>>>>>>>>>>> markrmiller@gmail.com
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> No, not that I know if, which is why I say we need to
>> get
>>>>>>>> to the
>>>>>>>>>>>>>>> bottom
>>>>>>>>>>>>>>>>>>>> of it.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> - Mark
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 10:18 PM, Jamie Johnson <
>>>>>>>> jej2003@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Mark
>>>>>>>>>>>>>>>>>>>>> It's there a particular jira issue that you think may
>>>>>>>> address
>>>>>>>>>>>>> this?
>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>> read
>>>>>>>>>>>>>>>>>>>>> through it quickly but didn't see one that jumped out
>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013 10:07 PM, "Jamie Johnson" <
>>>>>>>> jej2003@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> I brought the bad one down and back up and it did
>>>>>>>> nothing.  I
>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>> clear
>>>>>>>>>>>>>>>>>>>>>> the index and try4.2.1. I will save off the logs and
>> see
>>>>>>>> if
>>>>>>>>>>> there
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>> anything else odd
>>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013 9:13 PM, "Mark Miller" <
>>>>>>>> markrmiller@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> It would appear it's a bug given what you have said.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Any other exceptions would be useful. Might be best
>> to
>>>>>>>> start
>>>>>>>>>>>>>>>>>>>> tracking in
>>>>>>>>>>>>>>>>>>>>>>> a JIRA issue as well.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> To fix, I'd bring the behind node down and back
>> again.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Unfortunately, I'm pressed for time, but we really
>> need
>>>>>>>> to
>>>>>>>>>>> get
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> bottom of this and fix it, or determine if it's
>> fixed in
>>>>>>>>>>> 4.2.1
>>>>>>>>>>>>>>>>>>>> (spreading
>>>>>>>>>>>>>>>>>>>>>>> to mirrors now).
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> - Mark
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 7:21 PM, Jamie Johnson <
>>>>>>>> jej2003@gmail.com
>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Sorry I didn't ask the obvious question.  Is there
>>>>>>>> anything
>>>>>>>>>>>>> else
>>>>>>>>>>>>>>>>>>>> that I
>>>>>>>>>>>>>>>>>>>>>>>> should be looking for here and is this a bug?  I'd
>> be
>>>>>>>> happy
>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> troll
>>>>>>>>>>>>>>>>>>>>>>>> through the logs further if more information is
>>>>>>>> needed, just
>>>>>>>>>>>>> let
>>>>>>>>>>>>>>> me
>>>>>>>>>>>>>>>>>>>>>>> know.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Also what is the most appropriate mechanism to fix
>>>>>>>> this.
>>>>>>>>>>> Is it
>>>>>>>>>>>>>>>>>>>>>>> required to
>>>>>>>>>>>>>>>>>>>>>>>> kill the index that is out of sync and let solr
>> resync
>>>>>>>>>>> things?
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson <
>>>>>>>>>>>>> jej2003@gmail.com
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> sorry for spamming here....
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> shard5-core2 is the instance we're having issues
>>>>>>>> with...
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM
>>>>>>>> org.apache.solr.common.SolrException
>>>>>>>>>>>>> log
>>>>>>>>>>>>>>>>>>>>>>>>> SEVERE: shard update error StdNode:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>> http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException
>>>>>>>>>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>>>>>>>>>> Server at
>>>>>>>>>>>>> http://10.38.33.17:7577/solr/dsc-shard5-core2returned
>>>>>>>>>>>>>>>>>>>> non
>>>>>>>>>>>>>>>>>>>>>>> ok
>>>>>>>>>>>>>>>>>>>>>>>>> status:503, message:Service Unavailable
>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>>>>>>>>>>>>>>>>>>>>>>> at java.lang.Thread.run(Thread.java:662)
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson <
>>>>>>>>>>>>>>> jej2003@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> here is another one that looks interesting
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM
>>>>>>>>>>> org.apache.solr.common.SolrException
>>>>>>>>>>>>> log
>>>>>>>>>>>>>>>>>>>>>>>>>> SEVERE: org.apache.solr.common.SolrException:
>>>>>>>> ClusterState
>>>>>>>>>>>>> says
>>>>>>>>>>>>>>>>>>>> we are
>>>>>>>>>>>>>>>>>>>>>>>>>> the leader, but locally we don't think so
>>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>> org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293)
>>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228)
>>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339)
>>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
>>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
>>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
>>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>> 
>> org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
>>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
>>>>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson <
>>>>>>>>>>>>>>> jej2003@gmail.com
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Looking at the master it looks like at some point
>>>>>>>> there
>>>>>>>>>>> were
>>>>>>>>>>>>>>>>>>>> shards
>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>> went down.  I am seeing things like what is
>> below.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> NFO: A cluster state change: WatchedEvent
>>>>>>>>>>>>> state:SyncConnected
>>>>>>>>>>>>>>>>>>>>>>>>>>> type:NodeChildrenChanged path:/live_nodes, has
>>>>>>>> occurred -
>>>>>>>>>>>>>>>>>>>>>>> updating... (live
>>>>>>>>>>>>>>>>>>>>>>>>>>> nodes size: 12)
>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
>>>>>>>>>>>>>>>>>>>> org.apache.solr.common.cloud.ZkStateReader$3
>>>>>>>>>>>>>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Updating live nodes... (9)
>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
>>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
>>>>>>>>>>>>>>>>>>>>>>>>>>> runLeaderProcess
>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Running the leader process.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
>>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
>>>>>>>>>>>>>>>>>>>>>>>>>>> shouldIBeLeader
>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Checking if I should try and be the leader.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
>>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
>>>>>>>>>>>>>>>>>>>>>>>>>>> shouldIBeLeader
>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: My last published State was Active, it's
>> okay
>>>>>>>> to be
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> leader.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
>>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
>>>>>>>>>>>>>>>>>>>>>>>>>>> runLeaderProcess
>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: I may be the new leader - try and sync
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller <
>>>>>>>>>>>>>>>>>>>> markrmiller@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't think the versions you are thinking of
>>>>>>>> apply
>>>>>>>>>>> here.
>>>>>>>>>>>>>>>>>>>> Peersync
>>>>>>>>>>>>>>>>>>>>>>>>>>>> does not look at that - it looks at version
>>>>>>>> numbers for
>>>>>>>>>>>>>>>>>>>> updates in
>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> transaction log - it compares the last 100 of
>> them
>>>>>>>> on
>>>>>>>>>>>>> leader
>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> replica.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> What it's saying is that the replica seems to
>> have
>>>>>>>>>>> versions
>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>> the leader
>>>>>>>>>>>>>>>>>>>>>>>>>>>> does not. Have you scanned the logs for any
>>>>>>>> interesting
>>>>>>>>>>>>>>>>>>>> exceptions?
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Did the leader change during the heavy indexing?
>>>>>>>> Did
>>>>>>>>>>> any zk
>>>>>>>>>>>>>>>>>>>> session
>>>>>>>>>>>>>>>>>>>>>>>>>>>> timeouts occur?
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Mark
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 4:52 PM, Jamie Johnson <
>>>>>>>>>>>>> jej2003@gmail.com
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am currently looking at moving our Solr
>> cluster
>>>>>>>> to
>>>>>>>>>>> 4.2
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> noticed a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> strange issue while testing today.
>> Specifically
>>>>>>>> the
>>>>>>>>>>>>> replica
>>>>>>>>>>>>>>>>>>>> has a
>>>>>>>>>>>>>>>>>>>>>>>>>>>> higher
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> version than the master which is causing the
>>>>>>>> index to
>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>> replicate.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Because of this the replica has fewer documents
>>>>>>>> than
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> master.
>>>>>>>>>>>>>>>>>>>>>>> What
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> could cause this and how can I resolve it
>> short of
>>>>>>>>>>> taking
>>>>>>>>>>>>>>>>>>>> down the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> index
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and scping the right version in?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MASTER:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Last Modified:about an hour ago
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Num Docs:164880
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Max Doc:164880
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Deleted Docs:0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Version:2387
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Segment Count:23
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> REPLICA:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Last Modified: about an hour ago
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Num Docs:164773
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Max Doc:164773
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Deleted Docs:0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Version:3001
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Segment Count:30
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the replicas log it says this:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Creating new http client,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>> config:maxConnectionsPerHost=20&maxConnections=10000&connTimeout=30000&socketTimeout=30000&retry=false
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM
>>>>>>>> org.apache.solr.update.PeerSync
>>>>>>>>>>>>> sync
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> url=http://10.38.33.17:7577/solrSTARTreplicas=[
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://10.38.33.16:7575/solr/dsc-shard5-core1/
>> ]
>>>>>>>>>>>>>>> nUpdates=100
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM
>>>>>>>> org.apache.solr.update.PeerSync
>>>>>>>>>>>>>>>>>>>>>>> handleVersions
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url=
>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Received 100 versions from
>>>>>>>>>>>>>>>>>>>> 10.38.33.16:7575/solr/dsc-shard5-core1/
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM
>>>>>>>> org.apache.solr.update.PeerSync
>>>>>>>>>>>>>>>>>>>>>>> handleVersions
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url=
>>>>>>>>>>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr  Our
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> versions are newer.
>>>>>>>> ourLowThreshold=1431233788792274944
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> otherHigh=1431233789440294912
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM
>>>>>>>> org.apache.solr.update.PeerSync
>>>>>>>>>>>>> sync
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> url=http://10.38.33.17:7577/solrDONE. sync
>>>>>>>> succeeded
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which again seems to point that it thinks it
>> has a
>>>>>>>>>>> newer
>>>>>>>>>>>>>>>>>>>> version of
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> index so it aborts.  This happened while
>> having 10
>>>>>>>>>>> threads
>>>>>>>>>>>>>>>>>>>> indexing
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 10,000
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> items writing to a 6 shard (1 replica each)
>>>>>>>> cluster.
>>>>>>>>>>> Any
>>>>>>>>>>>>>>>>>>>> thoughts
>>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or what I should look for would be appreciated.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> 
>> 


Mime
View raw message