lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jamie Johnson <jej2...@gmail.com>
Subject Re: Solr 4.2 Cloud Replication Replica has higher version than Master?
Date Thu, 04 Apr 2013 00:54:31 GMT
Thanks I will try that.


On Wed, Apr 3, 2013 at 8:28 PM, Mark Miller <markrmiller@gmail.com> wrote:

>
>
> On Apr 3, 2013, at 8:17 PM, Jamie Johnson <jej2003@gmail.com> wrote:
>
> > I am not using the concurrent low pause garbage collector, I could look
> at
> > switching, I'm assuming you're talking about adding
> -XX:+UseConcMarkSweepGC
> > correct?
>
> Right - if you don't do that, the default is almost always the throughput
> collector (I've only seen OSX buck this trend when apple handled java).
> That means stop the world garbage collections, so with larger heaps, that
> can be a fair amount of time that no threads can run. It's not that great
> for something as interactive as search generally is anyway, but it's always
> not that great when added to heavy load and a 15 sec session timeout
> between solr and zk.
>
>
> The below is odd - a replica node is waiting for the leader to see it as
> recovering and live - live means it has created an ephemeral node for that
> Solr corecontainer in zk - it's very strange if that didn't happen, unless
> this happened during shutdown or something.
>
> >
> > I also just had a shard go down and am seeing this in the log
> >
> > SEVERE: org.apache.solr.common.SolrException: I was asked to wait on
> state
> > down for 10.38.33.17:7576_solr but I still do not see the requested
> state.
> > I see state: recovering live:false
> >        at
> >
> org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler.java:890)
> >        at
> >
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:186)
> >        at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> >        at
> >
> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:591)
> >        at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:192)
> >        at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
> >        at
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
> >        at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
> >        at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> >        at
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
> >        at
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> >
> > Nothing other than this in the log jumps out as interesting though.
> >
> >
> > On Wed, Apr 3, 2013 at 7:47 PM, Mark Miller <markrmiller@gmail.com>
> wrote:
> >
> >> This shouldn't be a problem though, if things are working as they are
> >> supposed to. Another node should simply take over as the overseer and
> >> continue processing the work queue. It's just best if you configure so
> that
> >> session timeouts don't happen unless a node is really down. On the other
> >> hand, it's nicer to detect that faster. Your tradeoff to make.
> >>
> >> - Mark
> >>
> >> On Apr 3, 2013, at 7:46 PM, Mark Miller <markrmiller@gmail.com> wrote:
> >>
> >>> Yeah. Are you using the concurrent low pause garbage collector?
> >>>
> >>> This means the overseer wasn't able to communicate with zk for 15
> >> seconds - due to load or gc or whatever. If you can't resolve the root
> >> cause of that, or the load just won't allow for it, next best thing you
> can
> >> do is raise it to 30 seconds.
> >>>
> >>> - Mark
> >>>
> >>> On Apr 3, 2013, at 7:41 PM, Jamie Johnson <jej2003@gmail.com> wrote:
> >>>
> >>>> I am occasionally seeing this in the log, is this just a timeout
> issue?
> >>>> Should I be increasing the zk client timeout?
> >>>>
> >>>> WARNING: Overseer cannot talk to ZK
> >>>> Apr 3, 2013 11:14:25 PM
> >>>> org.apache.solr.cloud.DistributedQueue$LatchChildWatcher process
> >>>> INFO: Watcher fired on path: null state: Expired type None
> >>>> Apr 3, 2013 11:14:25 PM
> >> org.apache.solr.cloud.Overseer$ClusterStateUpdater
> >>>> run
> >>>> WARNING: Solr cannot talk to ZK, exiting Overseer main queue loop
> >>>> org.apache.zookeeper.KeeperException$SessionExpiredException:
> >>>> KeeperErrorCode = Session expired for /overseer/queue
> >>>>      at
> >>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> >>>>      at
> >>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> >>>>      at
> org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468)
> >>>>      at
> >>>>
> >>
> org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:236)
> >>>>      at
> >>>>
> >>
> org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:233)
> >>>>      at
> >>>>
> >>
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65)
> >>>>      at
> >>>>
> >>
> org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:233)
> >>>>      at
> >>>>
> >>
> org.apache.solr.cloud.DistributedQueue.orderedChildren(DistributedQueue.java:89)
> >>>>      at
> >>>>
> >>
> org.apache.solr.cloud.DistributedQueue.element(DistributedQueue.java:131)
> >>>>      at
> >>>> org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:326)
> >>>>      at
> >>>>
> >>
> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:128)
> >>>>      at java.lang.Thread.run(Thread.java:662)
> >>>>
> >>>>
> >>>>
> >>>> On Wed, Apr 3, 2013 at 7:25 PM, Jamie Johnson <jej2003@gmail.com>
> >> wrote:
> >>>>
> >>>>> just an update, I'm at 1M records now with no issues.  This looks
> >>>>> promising as to the cause of my issues, thanks for the help.  Is the
> >>>>> routing method with numShards documented anywhere?  I know numShards
> is
> >>>>> documented but I didn't know that the routing changed if you don't
> >> specify
> >>>>> it.
> >>>>>
> >>>>>
> >>>>> On Wed, Apr 3, 2013 at 4:44 PM, Jamie Johnson <jej2003@gmail.com>
> >> wrote:
> >>>>>
> >>>>>> with these changes things are looking good, I'm up to 600,000
> >> documents
> >>>>>> without any issues as of right now.  I'll keep going and add more to
> >> see if
> >>>>>> I find anything.
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Apr 3, 2013 at 4:01 PM, Jamie Johnson <jej2003@gmail.com>
> >> wrote:
> >>>>>>
> >>>>>>> ok, so that's not a deal breaker for me.  I just changed it to
> match
> >> the
> >>>>>>> shards that are auto created and it looks like things are happy.
> >> I'll go
> >>>>>>> ahead and try my test to see if I can get things out of sync.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Apr 3, 2013 at 3:56 PM, Mark Miller <markrmiller@gmail.com
> >>> wrote:
> >>>>>>>
> >>>>>>>> I had thought you could - but looking at the code recently, I
> don't
> >>>>>>>> think you can anymore. I think that's a technical limitation more
> >> than
> >>>>>>>> anything though. When these changes were made, I think support for
> >> that was
> >>>>>>>> simply not added at the time.
> >>>>>>>>
> >>>>>>>> I'm not sure exactly how straightforward it would be, but it seems
> >>>>>>>> doable - as it is, the overseer will preallocate shards when first
> >> creating
> >>>>>>>> the collection - that's when they get named shard(n). There would
> >> have to
> >>>>>>>> be logic to replace shard(n) with the custom shard name when the
> >> core
> >>>>>>>> actually registers.
> >>>>>>>>
> >>>>>>>> - Mark
> >>>>>>>>
> >>>>>>>> On Apr 3, 2013, at 3:42 PM, Jamie Johnson <jej2003@gmail.com>
> >> wrote:
> >>>>>>>>
> >>>>>>>>> answered my own question, it now says compositeId.  What is
> >>>>>>>> problematic
> >>>>>>>>> though is that in addition to my shards (which are say
> >> jamie-shard1)
> >>>>>>>> I see
> >>>>>>>>> the solr created shards (shard1).  I assume that these were
> created
> >>>>>>>> because
> >>>>>>>>> of the numShards param.  Is there no way to specify the names of
> >> these
> >>>>>>>>> shards?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Wed, Apr 3, 2013 at 3:25 PM, Jamie Johnson <jej2003@gmail.com
> >
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> ah interesting....so I need to specify num shards, blow out zk
> and
> >>>>>>>> then
> >>>>>>>>>> try this again to see if things work properly now.  What is
> really
> >>>>>>>> strange
> >>>>>>>>>> is that for the most part things have worked right and on 4.2.1
> I
> >>>>>>>> have
> >>>>>>>>>> 600,000 items indexed with no duplicates.  In any event I will
> >>>>>>>> specify num
> >>>>>>>>>> shards clear out zk and begin again.  If this works properly
> what
> >>>>>>>> should
> >>>>>>>>>> the router type be?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Apr 3, 2013 at 3:14 PM, Mark Miller <
> >> markrmiller@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> If you don't specify numShards after 4.1, you get an implicit
> doc
> >>>>>>>> router
> >>>>>>>>>>> and it's up to you to distribute updates. In the past,
> >> partitioning
> >>>>>>>> was
> >>>>>>>>>>> done on the fly - but for shard splitting and perhaps other
> >>>>>>>> features, we
> >>>>>>>>>>> now divvy up the hash range up front based on numShards and
> store
> >>>>>>>> it in
> >>>>>>>>>>> ZooKeeper. No numShards is now how you take complete control of
> >>>>>>>> updates
> >>>>>>>>>>> yourself.
> >>>>>>>>>>>
> >>>>>>>>>>> - Mark
> >>>>>>>>>>>
> >>>>>>>>>>> On Apr 3, 2013, at 2:57 PM, Jamie Johnson <jej2003@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> The router says "implicit".  I did start from a blank zk state
> >> but
> >>>>>>>>>>> perhaps
> >>>>>>>>>>>> I missed one of the ZkCLI commands?  One of my shards from the
> >>>>>>>>>>>> clusterstate.json is shown below.  What is the process that
> >> should
> >>>>>>>> be
> >>>>>>>>>>> done
> >>>>>>>>>>>> to bootstrap a cluster other than the ZkCLI commands I listed
> >>>>>>>> above?  My
> >>>>>>>>>>>> process right now is run those ZkCLI commands and then start
> >> solr
> >>>>>>>> on
> >>>>>>>>>>> all of
> >>>>>>>>>>>> the instances with a command like this
> >>>>>>>>>>>>
> >>>>>>>>>>>> java -server -Dshard=shard5 -DcoreName=shard5-core1
> >>>>>>>>>>>> -Dsolr.data.dir=/solr/data/shard5-core1
> >>>>>>>>>>> -Dcollection.configName=solr-conf
> >>>>>>>>>>>> -Dcollection=collection1
> >>>>>>>> -DzkHost=so-zoo1:2181,so-zoo2:2181,so-zoo3:2181
> >>>>>>>>>>>> -Djetty.port=7575 -DhostPort=7575 -jar start.jar
> >>>>>>>>>>>>
> >>>>>>>>>>>> I feel like maybe I'm missing a step.
> >>>>>>>>>>>>
> >>>>>>>>>>>> "shard5":{
> >>>>>>>>>>>>    "state":"active",
> >>>>>>>>>>>>    "replicas":{
> >>>>>>>>>>>>      "10.38.33.16:7575_solr_shard5-core1":{
> >>>>>>>>>>>>        "shard":"shard5",
> >>>>>>>>>>>>        "state":"active",
> >>>>>>>>>>>>        "core":"shard5-core1",
> >>>>>>>>>>>>        "collection":"collection1",
> >>>>>>>>>>>>        "node_name":"10.38.33.16:7575_solr",
> >>>>>>>>>>>>        "base_url":"http://10.38.33.16:7575/solr",
> >>>>>>>>>>>>        "leader":"true"},
> >>>>>>>>>>>>      "10.38.33.17:7577_solr_shard5-core2":{
> >>>>>>>>>>>>        "shard":"shard5",
> >>>>>>>>>>>>        "state":"recovering",
> >>>>>>>>>>>>        "core":"shard5-core2",
> >>>>>>>>>>>>        "collection":"collection1",
> >>>>>>>>>>>>        "node_name":"10.38.33.17:7577_solr",
> >>>>>>>>>>>>        "base_url":"http://10.38.33.17:7577/solr"}}}
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Wed, Apr 3, 2013 at 2:40 PM, Mark Miller <
> >> markrmiller@gmail.com
> >>>>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> It should be part of your clusterstate.json. Some users have
> >>>>>>>> reported
> >>>>>>>>>>>>> trouble upgrading a previous zk install when this change
> came.
> >> I
> >>>>>>>>>>>>> recommended manually updating the clusterstate.json to have
> the
> >>>>>>>> right
> >>>>>>>>>>> info,
> >>>>>>>>>>>>> and that seemed to work. Otherwise, I guess you have to start
> >>>>>>>> from a
> >>>>>>>>>>> clean
> >>>>>>>>>>>>> zk state.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> If you don't have that range information, I think there will
> be
> >>>>>>>>>>> trouble.
> >>>>>>>>>>>>> Do you have an router type defined in the clusterstate.json?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> - Mark
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Apr 3, 2013, at 2:24 PM, Jamie Johnson <jej2003@gmail.com
> >
> >>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Where is this information stored in ZK?  I don't see it in
> the
> >>>>>>>> cluster
> >>>>>>>>>>>>>> state (or perhaps I don't understand it ;) ).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Perhaps something with my process is broken.  What I do
> when I
> >>>>>>>> start
> >>>>>>>>>>> from
> >>>>>>>>>>>>>> scratch is the following
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> ZkCLI -cmd upconfig ...
> >>>>>>>>>>>>>> ZkCLI -cmd linkconfig ....
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> but I don't ever explicitly create the collection.  What
> >> should
> >>>>>>>> the
> >>>>>>>>>>> steps
> >>>>>>>>>>>>>> from scratch be?  I am moving from an unreleased snapshot of
> >> 4.0
> >>>>>>>> so I
> >>>>>>>>>>>>> never
> >>>>>>>>>>>>>> did that previously either so perhaps I did create the
> >>>>>>>> collection in
> >>>>>>>>>>> one
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>> my steps to get this working but have forgotten it along the
> >> way.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 2:16 PM, Mark Miller <
> >>>>>>>> markrmiller@gmail.com>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks for digging Jamie. In 4.2, hash ranges are assigned
> up
> >>>>>>>> front
> >>>>>>>>>>>>> when a
> >>>>>>>>>>>>>>> collection is created - each shard gets a range, which is
> >>>>>>>> stored in
> >>>>>>>>>>>>>>> zookeeper. You should not be able to end up with the same
> id
> >> on
> >>>>>>>>>>>>> different
> >>>>>>>>>>>>>>> shards - something very odd going on.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hopefully I'll have some time to try and help you
> reproduce.
> >>>>>>>> Ideally
> >>>>>>>>>>> we
> >>>>>>>>>>>>>>> can capture it in a test case.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> - Mark
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Apr 3, 2013, at 1:13 PM, Jamie Johnson <
> jej2003@gmail.com
> >>>
> >>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> no, my thought was wrong, it appears that even with the
> >>>>>>>> parameter
> >>>>>>>>>>> set I
> >>>>>>>>>>>>>>> am
> >>>>>>>>>>>>>>>> seeing this behavior.  I've been able to duplicate it on
> >> 4.2.0
> >>>>>>>> by
> >>>>>>>>>>>>>>> indexing
> >>>>>>>>>>>>>>>> 100,000 documents on 10 threads (10,000 each) when I get
> to
> >>>>>>>> 400,000
> >>>>>>>>>>> or
> >>>>>>>>>>>>>>> so.
> >>>>>>>>>>>>>>>> I will try this on 4.2.1. to see if I see the same
> behavior
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 12:37 PM, Jamie Johnson <
> >>>>>>>> jej2003@gmail.com>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Since I don't have that many items in my index I exported
> >> all
> >>>>>>>> of
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>> keys
> >>>>>>>>>>>>>>>>> for each shard and wrote a simple java program that
> checks
> >> for
> >>>>>>>>>>>>>>> duplicates.
> >>>>>>>>>>>>>>>>> I found some duplicate keys on different shards, a grep
> of
> >> the
> >>>>>>>>>>> files
> >>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>> the keys found does indicate that they made it to the
> wrong
> >>>>>>>> places.
> >>>>>>>>>>>>> If
> >>>>>>>>>>>>>>> you
> >>>>>>>>>>>>>>>>> notice documents with the same ID are on shard 3 and
> shard
> >> 5.
> >>>>>>>> Is
> >>>>>>>>>>> it
> >>>>>>>>>>>>>>>>> possible that the hash is being calculated taking into
> >>>>>>>> account only
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> "live" nodes?  I know that we don't specify the numShards
> >>>>>>>> param @
> >>>>>>>>>>>>>>> startup
> >>>>>>>>>>>>>>>>> so could this be what is happening?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> grep -c "7cd1a717-3d94-4f5d-bcb1-9d8a95ca78de" *
> >>>>>>>>>>>>>>>>> shard1-core1:0
> >>>>>>>>>>>>>>>>> shard1-core2:0
> >>>>>>>>>>>>>>>>> shard2-core1:0
> >>>>>>>>>>>>>>>>> shard2-core2:0
> >>>>>>>>>>>>>>>>> shard3-core1:1
> >>>>>>>>>>>>>>>>> shard3-core2:1
> >>>>>>>>>>>>>>>>> shard4-core1:0
> >>>>>>>>>>>>>>>>> shard4-core2:0
> >>>>>>>>>>>>>>>>> shard5-core1:1
> >>>>>>>>>>>>>>>>> shard5-core2:1
> >>>>>>>>>>>>>>>>> shard6-core1:0
> >>>>>>>>>>>>>>>>> shard6-core2:0
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 10:42 AM, Jamie Johnson <
> >>>>>>>> jej2003@gmail.com>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Something interesting that I'm noticing as well, I just
> >>>>>>>> indexed
> >>>>>>>>>>>>> 300,000
> >>>>>>>>>>>>>>>>>> items, and some how 300,020 ended up in the index.  I
> >> thought
> >>>>>>>>>>>>> perhaps I
> >>>>>>>>>>>>>>>>>> messed something up so I started the indexing again and
> >>>>>>>> indexed
> >>>>>>>>>>>>> another
> >>>>>>>>>>>>>>>>>> 400,000 and I see 400,064 docs.  Is there a good way to
> >> find
> >>>>>>>>>>>>> possibile
> >>>>>>>>>>>>>>>>>> duplicates?  I had tried to facet on key (our id field)
> >> but
> >>>>>>>> that
> >>>>>>>>>>>>> didn't
> >>>>>>>>>>>>>>>>>> give me anything with more than a count of 1.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 9:22 AM, Jamie Johnson <
> >>>>>>>> jej2003@gmail.com>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Ok, so clearing the transaction log allowed things to
> go
> >>>>>>>> again.
> >>>>>>>>>>> I
> >>>>>>>>>>>>> am
> >>>>>>>>>>>>>>>>>>> going to clear the index and try to replicate the
> >> problem on
> >>>>>>>>>>> 4.2.0
> >>>>>>>>>>>>>>> and then
> >>>>>>>>>>>>>>>>>>> I'll try on 4.2.1
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 8:21 AM, Mark Miller <
> >>>>>>>>>>> markrmiller@gmail.com
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> No, not that I know if, which is why I say we need to
> >> get
> >>>>>>>> to the
> >>>>>>>>>>>>>>> bottom
> >>>>>>>>>>>>>>>>>>>> of it.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> - Mark
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 10:18 PM, Jamie Johnson <
> >>>>>>>> jej2003@gmail.com>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Mark
> >>>>>>>>>>>>>>>>>>>>> It's there a particular jira issue that you think may
> >>>>>>>> address
> >>>>>>>>>>>>> this?
> >>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>> read
> >>>>>>>>>>>>>>>>>>>>> through it quickly but didn't see one that jumped out
> >>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013 10:07 PM, "Jamie Johnson" <
> >>>>>>>> jej2003@gmail.com>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I brought the bad one down and back up and it did
> >>>>>>>> nothing.  I
> >>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>> clear
> >>>>>>>>>>>>>>>>>>>>>> the index and try4.2.1. I will save off the logs and
> >> see
> >>>>>>>> if
> >>>>>>>>>>> there
> >>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>> anything else odd
> >>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013 9:13 PM, "Mark Miller" <
> >>>>>>>> markrmiller@gmail.com>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> It would appear it's a bug given what you have
> said.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Any other exceptions would be useful. Might be best
> >> to
> >>>>>>>> start
> >>>>>>>>>>>>>>>>>>>> tracking in
> >>>>>>>>>>>>>>>>>>>>>>> a JIRA issue as well.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> To fix, I'd bring the behind node down and back
> >> again.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Unfortunately, I'm pressed for time, but we really
> >> need
> >>>>>>>> to
> >>>>>>>>>>> get
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>> bottom of this and fix it, or determine if it's
> >> fixed in
> >>>>>>>>>>> 4.2.1
> >>>>>>>>>>>>>>>>>>>> (spreading
> >>>>>>>>>>>>>>>>>>>>>>> to mirrors now).
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> - Mark
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 7:21 PM, Jamie Johnson <
> >>>>>>>> jej2003@gmail.com
> >>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Sorry I didn't ask the obvious question.  Is there
> >>>>>>>> anything
> >>>>>>>>>>>>> else
> >>>>>>>>>>>>>>>>>>>> that I
> >>>>>>>>>>>>>>>>>>>>>>>> should be looking for here and is this a bug?  I'd
> >> be
> >>>>>>>> happy
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>> troll
> >>>>>>>>>>>>>>>>>>>>>>>> through the logs further if more information is
> >>>>>>>> needed, just
> >>>>>>>>>>>>> let
> >>>>>>>>>>>>>>> me
> >>>>>>>>>>>>>>>>>>>>>>> know.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Also what is the most appropriate mechanism to fix
> >>>>>>>> this.
> >>>>>>>>>>> Is it
> >>>>>>>>>>>>>>>>>>>>>>> required to
> >>>>>>>>>>>>>>>>>>>>>>>> kill the index that is out of sync and let solr
> >> resync
> >>>>>>>>>>> things?
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson <
> >>>>>>>>>>>>> jej2003@gmail.com
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> sorry for spamming here....
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> shard5-core2 is the instance we're having issues
> >>>>>>>> with...
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM
> >>>>>>>> org.apache.solr.common.SolrException
> >>>>>>>>>>>>> log
> >>>>>>>>>>>>>>>>>>>>>>>>> SEVERE: shard update error StdNode:
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>
> http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException
> >>>>>>>>>>>>>>>>>>>>>>> :
> >>>>>>>>>>>>>>>>>>>>>>>>> Server at
> >>>>>>>>>>>>> http://10.38.33.17:7577/solr/dsc-shard5-core2returned
> >>>>>>>>>>>>>>>>>>>> non
> >>>>>>>>>>>>>>>>>>>>>>> ok
> >>>>>>>>>>>>>>>>>>>>>>>>> status:503, message:Service Unavailable
> >>>>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
> >>>>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> >>>>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
> >>>>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
> >>>>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >>>>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >>>>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> >>>>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >>>>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >>>>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >>>>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >>>>>>>>>>>>>>>>>>>>>>>>> at java.lang.Thread.run(Thread.java:662)
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson <
> >>>>>>>>>>>>>>> jej2003@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> here is another one that looks interesting
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM
> >>>>>>>>>>> org.apache.solr.common.SolrException
> >>>>>>>>>>>>> log
> >>>>>>>>>>>>>>>>>>>>>>>>>> SEVERE: org.apache.solr.common.SolrException:
> >>>>>>>> ClusterState
> >>>>>>>>>>>>> says
> >>>>>>>>>>>>>>>>>>>> we are
> >>>>>>>>>>>>>>>>>>>>>>>>>> the leader, but locally we don't think so
> >>>>>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>
> org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293)
> >>>>>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>
> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228)
> >>>>>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339)
> >>>>>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>
> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
> >>>>>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
> >>>>>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
> >>>>>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> >>>>>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> >>>>>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> >>>>>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>
> >> org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
> >>>>>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
> >>>>>>>>>>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson <
> >>>>>>>>>>>>>>> jej2003@gmail.com
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Looking at the master it looks like at some
> point
> >>>>>>>> there
> >>>>>>>>>>> were
> >>>>>>>>>>>>>>>>>>>> shards
> >>>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>>>> went down.  I am seeing things like what is
> >> below.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> NFO: A cluster state change: WatchedEvent
> >>>>>>>>>>>>> state:SyncConnected
> >>>>>>>>>>>>>>>>>>>>>>>>>>> type:NodeChildrenChanged path:/live_nodes, has
> >>>>>>>> occurred -
> >>>>>>>>>>>>>>>>>>>>>>> updating... (live
> >>>>>>>>>>>>>>>>>>>>>>>>>>> nodes size: 12)
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
> >>>>>>>>>>>>>>>>>>>> org.apache.solr.common.cloud.ZkStateReader$3
> >>>>>>>>>>>>>>>>>>>>>>>>>>> process
> >>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Updating live nodes... (9)
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
> >>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
> >>>>>>>>>>>>>>>>>>>>>>>>>>> runLeaderProcess
> >>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Running the leader process.
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
> >>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
> >>>>>>>>>>>>>>>>>>>>>>>>>>> shouldIBeLeader
> >>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Checking if I should try and be the
> leader.
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
> >>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
> >>>>>>>>>>>>>>>>>>>>>>>>>>> shouldIBeLeader
> >>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: My last published State was Active, it's
> >> okay
> >>>>>>>> to be
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> leader.
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
> >>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
> >>>>>>>>>>>>>>>>>>>>>>>>>>> runLeaderProcess
> >>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: I may be the new leader - try and sync
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller <
> >>>>>>>>>>>>>>>>>>>> markrmiller@gmail.com
> >>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't think the versions you are thinking of
> >>>>>>>> apply
> >>>>>>>>>>> here.
> >>>>>>>>>>>>>>>>>>>> Peersync
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> does not look at that - it looks at version
> >>>>>>>> numbers for
> >>>>>>>>>>>>>>>>>>>> updates in
> >>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> transaction log - it compares the last 100 of
> >> them
> >>>>>>>> on
> >>>>>>>>>>>>> leader
> >>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>> replica.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> What it's saying is that the replica seems to
> >> have
> >>>>>>>>>>> versions
> >>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>> the leader
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> does not. Have you scanned the logs for any
> >>>>>>>> interesting
> >>>>>>>>>>>>>>>>>>>> exceptions?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Did the leader change during the heavy
> indexing?
> >>>>>>>> Did
> >>>>>>>>>>> any zk
> >>>>>>>>>>>>>>>>>>>> session
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> timeouts occur?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> - Mark
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 4:52 PM, Jamie Johnson <
> >>>>>>>>>>>>> jej2003@gmail.com
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am currently looking at moving our Solr
> >> cluster
> >>>>>>>> to
> >>>>>>>>>>> 4.2
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>> noticed a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> strange issue while testing today.
> >> Specifically
> >>>>>>>> the
> >>>>>>>>>>>>> replica
> >>>>>>>>>>>>>>>>>>>> has a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> higher
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> version than the master which is causing the
> >>>>>>>> index to
> >>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>> replicate.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Because of this the replica has fewer
> documents
> >>>>>>>> than
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> master.
> >>>>>>>>>>>>>>>>>>>>>>> What
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> could cause this and how can I resolve it
> >> short of
> >>>>>>>>>>> taking
> >>>>>>>>>>>>>>>>>>>> down the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> index
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> and scping the right version in?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> MASTER:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Last Modified:about an hour ago
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Num Docs:164880
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Max Doc:164880
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Deleted Docs:0
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Version:2387
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Segment Count:23
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> REPLICA:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Last Modified: about an hour ago
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Num Docs:164773
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Max Doc:164773
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Deleted Docs:0
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Version:3001
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Segment Count:30
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the replicas log it says this:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Creating new http client,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>
> config:maxConnectionsPerHost=20&maxConnections=10000&connTimeout=30000&socketTimeout=30000&retry=false
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM
> >>>>>>>> org.apache.solr.update.PeerSync
> >>>>>>>>>>>>> sync
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> url=
> http://10.38.33.17:7577/solrSTARTreplicas=[
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> http://10.38.33.16:7575/solr/dsc-shard5-core1/
> >> ]
> >>>>>>>>>>>>>>> nUpdates=100
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM
> >>>>>>>> org.apache.solr.update.PeerSync
> >>>>>>>>>>>>>>>>>>>>>>> handleVersions
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url=
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Received 100 versions from
> >>>>>>>>>>>>>>>>>>>> 10.38.33.16:7575/solr/dsc-shard5-core1/
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM
> >>>>>>>> org.apache.solr.update.PeerSync
> >>>>>>>>>>>>>>>>>>>>>>> handleVersions
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url=
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr  Our
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> versions are newer.
> >>>>>>>> ourLowThreshold=1431233788792274944
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> otherHigh=1431233789440294912
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM
> >>>>>>>> org.apache.solr.update.PeerSync
> >>>>>>>>>>>>> sync
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> url=http://10.38.33.17:7577/solrDONE. sync
> >>>>>>>> succeeded
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> which again seems to point that it thinks it
> >> has a
> >>>>>>>>>>> newer
> >>>>>>>>>>>>>>>>>>>> version of
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> index so it aborts.  This happened while
> >> having 10
> >>>>>>>>>>> threads
> >>>>>>>>>>>>>>>>>>>> indexing
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> 10,000
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> items writing to a 6 shard (1 replica each)
> >>>>>>>> cluster.
> >>>>>>>>>>> Any
> >>>>>>>>>>>>>>>>>>>> thoughts
> >>>>>>>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> or what I should look for would be
> appreciated.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message