Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E587BF9C8 for ; Thu, 4 Apr 2013 00:55:07 +0000 (UTC) Received: (qmail 73089 invoked by uid 500); 4 Apr 2013 00:55:04 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 73047 invoked by uid 500); 4 Apr 2013 00:55:04 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 73036 invoked by uid 99); 4 Apr 2013 00:55:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Apr 2013 00:55:04 +0000 X-ASF-Spam-Status: No, hits=2.8 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_REPLY,HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,T_FRT_POSSIBLE,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jej2003@gmail.com designates 209.85.214.174 as permitted sender) Received: from [209.85.214.174] (HELO mail-ob0-f174.google.com) (209.85.214.174) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Apr 2013 00:54:53 +0000 Received: by mail-ob0-f174.google.com with SMTP id 16so2025032obc.19 for ; Wed, 03 Apr 2013 17:54:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=S7GuAawfeQcZdHgCW8MJ9zX2dNwZg+fR08UnVNA/wbA=; b=peytlfpdxiFUisFMzSmtA6FMH3ovltMZgpVdbqUWb/EPrn2May866es7SMo2HvApEv 0zeQD47a9Cy6WmAtSL80eNvatC87iBbQ/XiGn4lP4g1n240pLSp+GN0UzdMR0yvFCmdi HvdWn9DsILJE5Z87Dn0QfDO8XWZYNTjxu3sZ0G3/JF8ePhYNHKR5v2VXYtPvNKavHsDP e5FpHerQBWzuttgrZiq+DOnQlGOMGtSpEK91ER3XhusRE4bqZ2UuZWagSbcWoIz5vdXo U34xdrz7kRLVR8YaRiaqxTz45WJ1j0OZPWjmGwvHcnT74PfBmchSxSq2txrWMOB0nu9Q sVxQ== MIME-Version: 1.0 X-Received: by 10.182.204.68 with SMTP id kw4mr2669849obc.5.1365036871711; Wed, 03 Apr 2013 17:54:31 -0700 (PDT) Received: by 10.76.22.233 with HTTP; Wed, 3 Apr 2013 17:54:31 -0700 (PDT) In-Reply-To: <66267D19-A665-4A51-9AD9-4D27A8119F89@gmail.com> References: <2FA0E35A-7141-42E0-AA9D-452289334FE4@gmail.com> <79DC90FB-658F-4B8F-A118-3B37328008BF@gmail.com> <4F435E7B-D36F-40F4-B2B7-7A5AED30AFDC@gmail.com> <87C77C86-2B6F-48C0-BCB0-244E196B127C@gmail.com> <66267D19-A665-4A51-9AD9-4D27A8119F89@gmail.com> Date: Wed, 3 Apr 2013 20:54:31 -0400 Message-ID: Subject: Re: Solr 4.2 Cloud Replication Replica has higher version than Master? From: Jamie Johnson To: "solr-user@lucene.apache.org" Content-Type: multipart/alternative; boundary=e89a8ff1c9b8d801ed04d97e69b8 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8ff1c9b8d801ed04d97e69b8 Content-Type: text/plain; charset=ISO-8859-1 Thanks I will try that. On Wed, Apr 3, 2013 at 8:28 PM, Mark Miller wrote: > > > On Apr 3, 2013, at 8:17 PM, Jamie Johnson wrote: > > > I am not using the concurrent low pause garbage collector, I could look > at > > switching, I'm assuming you're talking about adding > -XX:+UseConcMarkSweepGC > > correct? > > Right - if you don't do that, the default is almost always the throughput > collector (I've only seen OSX buck this trend when apple handled java). > That means stop the world garbage collections, so with larger heaps, that > can be a fair amount of time that no threads can run. It's not that great > for something as interactive as search generally is anyway, but it's always > not that great when added to heavy load and a 15 sec session timeout > between solr and zk. > > > The below is odd - a replica node is waiting for the leader to see it as > recovering and live - live means it has created an ephemeral node for that > Solr corecontainer in zk - it's very strange if that didn't happen, unless > this happened during shutdown or something. > > > > > I also just had a shard go down and am seeing this in the log > > > > SEVERE: org.apache.solr.common.SolrException: I was asked to wait on > state > > down for 10.38.33.17:7576_solr but I still do not see the requested > state. > > I see state: recovering live:false > > at > > > org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler.java:890) > > at > > > org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:186) > > at > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:591) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:192) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) > > at > > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) > > at > > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) > > at > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) > > at > > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) > > at > > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) > > > > Nothing other than this in the log jumps out as interesting though. > > > > > > On Wed, Apr 3, 2013 at 7:47 PM, Mark Miller > wrote: > > > >> This shouldn't be a problem though, if things are working as they are > >> supposed to. Another node should simply take over as the overseer and > >> continue processing the work queue. It's just best if you configure so > that > >> session timeouts don't happen unless a node is really down. On the other > >> hand, it's nicer to detect that faster. Your tradeoff to make. > >> > >> - Mark > >> > >> On Apr 3, 2013, at 7:46 PM, Mark Miller wrote: > >> > >>> Yeah. Are you using the concurrent low pause garbage collector? > >>> > >>> This means the overseer wasn't able to communicate with zk for 15 > >> seconds - due to load or gc or whatever. If you can't resolve the root > >> cause of that, or the load just won't allow for it, next best thing you > can > >> do is raise it to 30 seconds. > >>> > >>> - Mark > >>> > >>> On Apr 3, 2013, at 7:41 PM, Jamie Johnson wrote: > >>> > >>>> I am occasionally seeing this in the log, is this just a timeout > issue? > >>>> Should I be increasing the zk client timeout? > >>>> > >>>> WARNING: Overseer cannot talk to ZK > >>>> Apr 3, 2013 11:14:25 PM > >>>> org.apache.solr.cloud.DistributedQueue$LatchChildWatcher process > >>>> INFO: Watcher fired on path: null state: Expired type None > >>>> Apr 3, 2013 11:14:25 PM > >> org.apache.solr.cloud.Overseer$ClusterStateUpdater > >>>> run > >>>> WARNING: Solr cannot talk to ZK, exiting Overseer main queue loop > >>>> org.apache.zookeeper.KeeperException$SessionExpiredException: > >>>> KeeperErrorCode = Session expired for /overseer/queue > >>>> at > >>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:127) > >>>> at > >>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > >>>> at > org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468) > >>>> at > >>>> > >> > org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:236) > >>>> at > >>>> > >> > org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:233) > >>>> at > >>>> > >> > org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65) > >>>> at > >>>> > >> > org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:233) > >>>> at > >>>> > >> > org.apache.solr.cloud.DistributedQueue.orderedChildren(DistributedQueue.java:89) > >>>> at > >>>> > >> > org.apache.solr.cloud.DistributedQueue.element(DistributedQueue.java:131) > >>>> at > >>>> org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:326) > >>>> at > >>>> > >> > org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:128) > >>>> at java.lang.Thread.run(Thread.java:662) > >>>> > >>>> > >>>> > >>>> On Wed, Apr 3, 2013 at 7:25 PM, Jamie Johnson > >> wrote: > >>>> > >>>>> just an update, I'm at 1M records now with no issues. This looks > >>>>> promising as to the cause of my issues, thanks for the help. Is the > >>>>> routing method with numShards documented anywhere? I know numShards > is > >>>>> documented but I didn't know that the routing changed if you don't > >> specify > >>>>> it. > >>>>> > >>>>> > >>>>> On Wed, Apr 3, 2013 at 4:44 PM, Jamie Johnson > >> wrote: > >>>>> > >>>>>> with these changes things are looking good, I'm up to 600,000 > >> documents > >>>>>> without any issues as of right now. I'll keep going and add more to > >> see if > >>>>>> I find anything. > >>>>>> > >>>>>> > >>>>>> On Wed, Apr 3, 2013 at 4:01 PM, Jamie Johnson > >> wrote: > >>>>>> > >>>>>>> ok, so that's not a deal breaker for me. I just changed it to > match > >> the > >>>>>>> shards that are auto created and it looks like things are happy. > >> I'll go > >>>>>>> ahead and try my test to see if I can get things out of sync. > >>>>>>> > >>>>>>> > >>>>>>> On Wed, Apr 3, 2013 at 3:56 PM, Mark Miller >>> wrote: > >>>>>>> > >>>>>>>> I had thought you could - but looking at the code recently, I > don't > >>>>>>>> think you can anymore. I think that's a technical limitation more > >> than > >>>>>>>> anything though. When these changes were made, I think support for > >> that was > >>>>>>>> simply not added at the time. > >>>>>>>> > >>>>>>>> I'm not sure exactly how straightforward it would be, but it seems > >>>>>>>> doable - as it is, the overseer will preallocate shards when first > >> creating > >>>>>>>> the collection - that's when they get named shard(n). There would > >> have to > >>>>>>>> be logic to replace shard(n) with the custom shard name when the > >> core > >>>>>>>> actually registers. > >>>>>>>> > >>>>>>>> - Mark > >>>>>>>> > >>>>>>>> On Apr 3, 2013, at 3:42 PM, Jamie Johnson > >> wrote: > >>>>>>>> > >>>>>>>>> answered my own question, it now says compositeId. What is > >>>>>>>> problematic > >>>>>>>>> though is that in addition to my shards (which are say > >> jamie-shard1) > >>>>>>>> I see > >>>>>>>>> the solr created shards (shard1). I assume that these were > created > >>>>>>>> because > >>>>>>>>> of the numShards param. Is there no way to specify the names of > >> these > >>>>>>>>> shards? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Wed, Apr 3, 2013 at 3:25 PM, Jamie Johnson > > >>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> ah interesting....so I need to specify num shards, blow out zk > and > >>>>>>>> then > >>>>>>>>>> try this again to see if things work properly now. What is > really > >>>>>>>> strange > >>>>>>>>>> is that for the most part things have worked right and on 4.2.1 > I > >>>>>>>> have > >>>>>>>>>> 600,000 items indexed with no duplicates. In any event I will > >>>>>>>> specify num > >>>>>>>>>> shards clear out zk and begin again. If this works properly > what > >>>>>>>> should > >>>>>>>>>> the router type be? > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Wed, Apr 3, 2013 at 3:14 PM, Mark Miller < > >> markrmiller@gmail.com> > >>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> If you don't specify numShards after 4.1, you get an implicit > doc > >>>>>>>> router > >>>>>>>>>>> and it's up to you to distribute updates. In the past, > >> partitioning > >>>>>>>> was > >>>>>>>>>>> done on the fly - but for shard splitting and perhaps other > >>>>>>>> features, we > >>>>>>>>>>> now divvy up the hash range up front based on numShards and > store > >>>>>>>> it in > >>>>>>>>>>> ZooKeeper. No numShards is now how you take complete control of > >>>>>>>> updates > >>>>>>>>>>> yourself. > >>>>>>>>>>> > >>>>>>>>>>> - Mark > >>>>>>>>>>> > >>>>>>>>>>> On Apr 3, 2013, at 2:57 PM, Jamie Johnson > >>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> The router says "implicit". I did start from a blank zk state > >> but > >>>>>>>>>>> perhaps > >>>>>>>>>>>> I missed one of the ZkCLI commands? One of my shards from the > >>>>>>>>>>>> clusterstate.json is shown below. What is the process that > >> should > >>>>>>>> be > >>>>>>>>>>> done > >>>>>>>>>>>> to bootstrap a cluster other than the ZkCLI commands I listed > >>>>>>>> above? My > >>>>>>>>>>>> process right now is run those ZkCLI commands and then start > >> solr > >>>>>>>> on > >>>>>>>>>>> all of > >>>>>>>>>>>> the instances with a command like this > >>>>>>>>>>>> > >>>>>>>>>>>> java -server -Dshard=shard5 -DcoreName=shard5-core1 > >>>>>>>>>>>> -Dsolr.data.dir=/solr/data/shard5-core1 > >>>>>>>>>>> -Dcollection.configName=solr-conf > >>>>>>>>>>>> -Dcollection=collection1 > >>>>>>>> -DzkHost=so-zoo1:2181,so-zoo2:2181,so-zoo3:2181 > >>>>>>>>>>>> -Djetty.port=7575 -DhostPort=7575 -jar start.jar > >>>>>>>>>>>> > >>>>>>>>>>>> I feel like maybe I'm missing a step. > >>>>>>>>>>>> > >>>>>>>>>>>> "shard5":{ > >>>>>>>>>>>> "state":"active", > >>>>>>>>>>>> "replicas":{ > >>>>>>>>>>>> "10.38.33.16:7575_solr_shard5-core1":{ > >>>>>>>>>>>> "shard":"shard5", > >>>>>>>>>>>> "state":"active", > >>>>>>>>>>>> "core":"shard5-core1", > >>>>>>>>>>>> "collection":"collection1", > >>>>>>>>>>>> "node_name":"10.38.33.16:7575_solr", > >>>>>>>>>>>> "base_url":"http://10.38.33.16:7575/solr", > >>>>>>>>>>>> "leader":"true"}, > >>>>>>>>>>>> "10.38.33.17:7577_solr_shard5-core2":{ > >>>>>>>>>>>> "shard":"shard5", > >>>>>>>>>>>> "state":"recovering", > >>>>>>>>>>>> "core":"shard5-core2", > >>>>>>>>>>>> "collection":"collection1", > >>>>>>>>>>>> "node_name":"10.38.33.17:7577_solr", > >>>>>>>>>>>> "base_url":"http://10.38.33.17:7577/solr"}}} > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Wed, Apr 3, 2013 at 2:40 PM, Mark Miller < > >> markrmiller@gmail.com > >>>>>>>>> > >>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> It should be part of your clusterstate.json. Some users have > >>>>>>>> reported > >>>>>>>>>>>>> trouble upgrading a previous zk install when this change > came. > >> I > >>>>>>>>>>>>> recommended manually updating the clusterstate.json to have > the > >>>>>>>> right > >>>>>>>>>>> info, > >>>>>>>>>>>>> and that seemed to work. Otherwise, I guess you have to start > >>>>>>>> from a > >>>>>>>>>>> clean > >>>>>>>>>>>>> zk state. > >>>>>>>>>>>>> > >>>>>>>>>>>>> If you don't have that range information, I think there will > be > >>>>>>>>>>> trouble. > >>>>>>>>>>>>> Do you have an router type defined in the clusterstate.json? > >>>>>>>>>>>>> > >>>>>>>>>>>>> - Mark > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Apr 3, 2013, at 2:24 PM, Jamie Johnson > > >>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Where is this information stored in ZK? I don't see it in > the > >>>>>>>> cluster > >>>>>>>>>>>>>> state (or perhaps I don't understand it ;) ). > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Perhaps something with my process is broken. What I do > when I > >>>>>>>> start > >>>>>>>>>>> from > >>>>>>>>>>>>>> scratch is the following > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> ZkCLI -cmd upconfig ... > >>>>>>>>>>>>>> ZkCLI -cmd linkconfig .... > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> but I don't ever explicitly create the collection. What > >> should > >>>>>>>> the > >>>>>>>>>>> steps > >>>>>>>>>>>>>> from scratch be? I am moving from an unreleased snapshot of > >> 4.0 > >>>>>>>> so I > >>>>>>>>>>>>> never > >>>>>>>>>>>>>> did that previously either so perhaps I did create the > >>>>>>>> collection in > >>>>>>>>>>> one > >>>>>>>>>>>>> of > >>>>>>>>>>>>>> my steps to get this working but have forgotten it along the > >> way. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 2:16 PM, Mark Miller < > >>>>>>>> markrmiller@gmail.com> > >>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks for digging Jamie. In 4.2, hash ranges are assigned > up > >>>>>>>> front > >>>>>>>>>>>>> when a > >>>>>>>>>>>>>>> collection is created - each shard gets a range, which is > >>>>>>>> stored in > >>>>>>>>>>>>>>> zookeeper. You should not be able to end up with the same > id > >> on > >>>>>>>>>>>>> different > >>>>>>>>>>>>>>> shards - something very odd going on. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hopefully I'll have some time to try and help you > reproduce. > >>>>>>>> Ideally > >>>>>>>>>>> we > >>>>>>>>>>>>>>> can capture it in a test case. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> - Mark > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Apr 3, 2013, at 1:13 PM, Jamie Johnson < > jej2003@gmail.com > >>> > >>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> no, my thought was wrong, it appears that even with the > >>>>>>>> parameter > >>>>>>>>>>> set I > >>>>>>>>>>>>>>> am > >>>>>>>>>>>>>>>> seeing this behavior. I've been able to duplicate it on > >> 4.2.0 > >>>>>>>> by > >>>>>>>>>>>>>>> indexing > >>>>>>>>>>>>>>>> 100,000 documents on 10 threads (10,000 each) when I get > to > >>>>>>>> 400,000 > >>>>>>>>>>> or > >>>>>>>>>>>>>>> so. > >>>>>>>>>>>>>>>> I will try this on 4.2.1. to see if I see the same > behavior > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 12:37 PM, Jamie Johnson < > >>>>>>>> jej2003@gmail.com> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Since I don't have that many items in my index I exported > >> all > >>>>>>>> of > >>>>>>>>>>> the > >>>>>>>>>>>>>>> keys > >>>>>>>>>>>>>>>>> for each shard and wrote a simple java program that > checks > >> for > >>>>>>>>>>>>>>> duplicates. > >>>>>>>>>>>>>>>>> I found some duplicate keys on different shards, a grep > of > >> the > >>>>>>>>>>> files > >>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>> the keys found does indicate that they made it to the > wrong > >>>>>>>> places. > >>>>>>>>>>>>> If > >>>>>>>>>>>>>>> you > >>>>>>>>>>>>>>>>> notice documents with the same ID are on shard 3 and > shard > >> 5. > >>>>>>>> Is > >>>>>>>>>>> it > >>>>>>>>>>>>>>>>> possible that the hash is being calculated taking into > >>>>>>>> account only > >>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>> "live" nodes? I know that we don't specify the numShards > >>>>>>>> param @ > >>>>>>>>>>>>>>> startup > >>>>>>>>>>>>>>>>> so could this be what is happening? > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> grep -c "7cd1a717-3d94-4f5d-bcb1-9d8a95ca78de" * > >>>>>>>>>>>>>>>>> shard1-core1:0 > >>>>>>>>>>>>>>>>> shard1-core2:0 > >>>>>>>>>>>>>>>>> shard2-core1:0 > >>>>>>>>>>>>>>>>> shard2-core2:0 > >>>>>>>>>>>>>>>>> shard3-core1:1 > >>>>>>>>>>>>>>>>> shard3-core2:1 > >>>>>>>>>>>>>>>>> shard4-core1:0 > >>>>>>>>>>>>>>>>> shard4-core2:0 > >>>>>>>>>>>>>>>>> shard5-core1:1 > >>>>>>>>>>>>>>>>> shard5-core2:1 > >>>>>>>>>>>>>>>>> shard6-core1:0 > >>>>>>>>>>>>>>>>> shard6-core2:0 > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 10:42 AM, Jamie Johnson < > >>>>>>>> jej2003@gmail.com> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Something interesting that I'm noticing as well, I just > >>>>>>>> indexed > >>>>>>>>>>>>> 300,000 > >>>>>>>>>>>>>>>>>> items, and some how 300,020 ended up in the index. I > >> thought > >>>>>>>>>>>>> perhaps I > >>>>>>>>>>>>>>>>>> messed something up so I started the indexing again and > >>>>>>>> indexed > >>>>>>>>>>>>> another > >>>>>>>>>>>>>>>>>> 400,000 and I see 400,064 docs. Is there a good way to > >> find > >>>>>>>>>>>>> possibile > >>>>>>>>>>>>>>>>>> duplicates? I had tried to facet on key (our id field) > >> but > >>>>>>>> that > >>>>>>>>>>>>> didn't > >>>>>>>>>>>>>>>>>> give me anything with more than a count of 1. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 9:22 AM, Jamie Johnson < > >>>>>>>> jej2003@gmail.com> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Ok, so clearing the transaction log allowed things to > go > >>>>>>>> again. > >>>>>>>>>>> I > >>>>>>>>>>>>> am > >>>>>>>>>>>>>>>>>>> going to clear the index and try to replicate the > >> problem on > >>>>>>>>>>> 4.2.0 > >>>>>>>>>>>>>>> and then > >>>>>>>>>>>>>>>>>>> I'll try on 4.2.1 > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 8:21 AM, Mark Miller < > >>>>>>>>>>> markrmiller@gmail.com > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> No, not that I know if, which is why I say we need to > >> get > >>>>>>>> to the > >>>>>>>>>>>>>>> bottom > >>>>>>>>>>>>>>>>>>>> of it. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> - Mark > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 10:18 PM, Jamie Johnson < > >>>>>>>> jej2003@gmail.com> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Mark > >>>>>>>>>>>>>>>>>>>>> It's there a particular jira issue that you think may > >>>>>>>> address > >>>>>>>>>>>>> this? > >>>>>>>>>>>>>>> I > >>>>>>>>>>>>>>>>>>>> read > >>>>>>>>>>>>>>>>>>>>> through it quickly but didn't see one that jumped out > >>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013 10:07 PM, "Jamie Johnson" < > >>>>>>>> jej2003@gmail.com> > >>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> I brought the bad one down and back up and it did > >>>>>>>> nothing. I > >>>>>>>>>>> can > >>>>>>>>>>>>>>>>>>>> clear > >>>>>>>>>>>>>>>>>>>>>> the index and try4.2.1. I will save off the logs and > >> see > >>>>>>>> if > >>>>>>>>>>> there > >>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>>>>>> anything else odd > >>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013 9:13 PM, "Mark Miller" < > >>>>>>>> markrmiller@gmail.com> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> It would appear it's a bug given what you have > said. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Any other exceptions would be useful. Might be best > >> to > >>>>>>>> start > >>>>>>>>>>>>>>>>>>>> tracking in > >>>>>>>>>>>>>>>>>>>>>>> a JIRA issue as well. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> To fix, I'd bring the behind node down and back > >> again. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Unfortunately, I'm pressed for time, but we really > >> need > >>>>>>>> to > >>>>>>>>>>> get > >>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>> bottom of this and fix it, or determine if it's > >> fixed in > >>>>>>>>>>> 4.2.1 > >>>>>>>>>>>>>>>>>>>> (spreading > >>>>>>>>>>>>>>>>>>>>>>> to mirrors now). > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> - Mark > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 7:21 PM, Jamie Johnson < > >>>>>>>> jej2003@gmail.com > >>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Sorry I didn't ask the obvious question. Is there > >>>>>>>> anything > >>>>>>>>>>>>> else > >>>>>>>>>>>>>>>>>>>> that I > >>>>>>>>>>>>>>>>>>>>>>>> should be looking for here and is this a bug? I'd > >> be > >>>>>>>> happy > >>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>> troll > >>>>>>>>>>>>>>>>>>>>>>>> through the logs further if more information is > >>>>>>>> needed, just > >>>>>>>>>>>>> let > >>>>>>>>>>>>>>> me > >>>>>>>>>>>>>>>>>>>>>>> know. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Also what is the most appropriate mechanism to fix > >>>>>>>> this. > >>>>>>>>>>> Is it > >>>>>>>>>>>>>>>>>>>>>>> required to > >>>>>>>>>>>>>>>>>>>>>>>> kill the index that is out of sync and let solr > >> resync > >>>>>>>>>>> things? > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson < > >>>>>>>>>>>>> jej2003@gmail.com > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> sorry for spamming here.... > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> shard5-core2 is the instance we're having issues > >>>>>>>> with... > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM > >>>>>>>> org.apache.solr.common.SolrException > >>>>>>>>>>>>> log > >>>>>>>>>>>>>>>>>>>>>>>>> SEVERE: shard update error StdNode: > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >> > http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException > >>>>>>>>>>>>>>>>>>>>>>> : > >>>>>>>>>>>>>>>>>>>>>>>>> Server at > >>>>>>>>>>>>> http://10.38.33.17:7577/solr/dsc-shard5-core2returned > >>>>>>>>>>>>>>>>>>>> non > >>>>>>>>>>>>>>>>>>>>>>> ok > >>>>>>>>>>>>>>>>>>>>>>>>> status:503, message:Service Unavailable > >>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373) > >>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) > >>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >> > org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332) > >>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >> > org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306) > >>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > >>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138) > >>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) > >>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > >>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138) > >>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > >>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > >>>>>>>>>>>>>>>>>>>>>>>>> at java.lang.Thread.run(Thread.java:662) > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson < > >>>>>>>>>>>>>>> jej2003@gmail.com> > >>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> here is another one that looks interesting > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM > >>>>>>>>>>> org.apache.solr.common.SolrException > >>>>>>>>>>>>> log > >>>>>>>>>>>>>>>>>>>>>>>>>> SEVERE: org.apache.solr.common.SolrException: > >>>>>>>> ClusterState > >>>>>>>>>>>>> says > >>>>>>>>>>>>>>>>>>>> we are > >>>>>>>>>>>>>>>>>>>>>>>>>> the leader, but locally we don't think so > >>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >> > org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293) > >>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >> > org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228) > >>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >> > org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339) > >>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >> > org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) > >>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >> > org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246) > >>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) > >>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >> > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) > >>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >> > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > >>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > >>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>> > >> org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) > >>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) > >>>>>>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson < > >>>>>>>>>>>>>>> jej2003@gmail.com > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Looking at the master it looks like at some > point > >>>>>>>> there > >>>>>>>>>>> were > >>>>>>>>>>>>>>>>>>>> shards > >>>>>>>>>>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>>>>>>>>>>> went down. I am seeing things like what is > >> below. > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> NFO: A cluster state change: WatchedEvent > >>>>>>>>>>>>> state:SyncConnected > >>>>>>>>>>>>>>>>>>>>>>>>>>> type:NodeChildrenChanged path:/live_nodes, has > >>>>>>>> occurred - > >>>>>>>>>>>>>>>>>>>>>>> updating... (live > >>>>>>>>>>>>>>>>>>>>>>>>>>> nodes size: 12) > >>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM > >>>>>>>>>>>>>>>>>>>> org.apache.solr.common.cloud.ZkStateReader$3 > >>>>>>>>>>>>>>>>>>>>>>>>>>> process > >>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Updating live nodes... (9) > >>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM > >>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext > >>>>>>>>>>>>>>>>>>>>>>>>>>> runLeaderProcess > >>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Running the leader process. > >>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM > >>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext > >>>>>>>>>>>>>>>>>>>>>>>>>>> shouldIBeLeader > >>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Checking if I should try and be the > leader. > >>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM > >>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext > >>>>>>>>>>>>>>>>>>>>>>>>>>> shouldIBeLeader > >>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: My last published State was Active, it's > >> okay > >>>>>>>> to be > >>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>> leader. > >>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM > >>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext > >>>>>>>>>>>>>>>>>>>>>>>>>>> runLeaderProcess > >>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: I may be the new leader - try and sync > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller < > >>>>>>>>>>>>>>>>>>>> markrmiller@gmail.com > >>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't think the versions you are thinking of > >>>>>>>> apply > >>>>>>>>>>> here. > >>>>>>>>>>>>>>>>>>>> Peersync > >>>>>>>>>>>>>>>>>>>>>>>>>>>> does not look at that - it looks at version > >>>>>>>> numbers for > >>>>>>>>>>>>>>>>>>>> updates in > >>>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>>>>> transaction log - it compares the last 100 of > >> them > >>>>>>>> on > >>>>>>>>>>>>> leader > >>>>>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>>>>> replica. > >>>>>>>>>>>>>>>>>>>>>>>>>>>> What it's saying is that the replica seems to > >> have > >>>>>>>>>>> versions > >>>>>>>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>>>>>>> the leader > >>>>>>>>>>>>>>>>>>>>>>>>>>>> does not. Have you scanned the logs for any > >>>>>>>> interesting > >>>>>>>>>>>>>>>>>>>> exceptions? > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Did the leader change during the heavy > indexing? > >>>>>>>> Did > >>>>>>>>>>> any zk > >>>>>>>>>>>>>>>>>>>> session > >>>>>>>>>>>>>>>>>>>>>>>>>>>> timeouts occur? > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> - Mark > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 4:52 PM, Jamie Johnson < > >>>>>>>>>>>>> jej2003@gmail.com > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am currently looking at moving our Solr > >> cluster > >>>>>>>> to > >>>>>>>>>>> 4.2 > >>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>>>>> noticed a > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> strange issue while testing today. > >> Specifically > >>>>>>>> the > >>>>>>>>>>>>> replica > >>>>>>>>>>>>>>>>>>>> has a > >>>>>>>>>>>>>>>>>>>>>>>>>>>> higher > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> version than the master which is causing the > >>>>>>>> index to > >>>>>>>>>>> not > >>>>>>>>>>>>>>>>>>>>>>> replicate. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Because of this the replica has fewer > documents > >>>>>>>> than > >>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>> master. > >>>>>>>>>>>>>>>>>>>>>>> What > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> could cause this and how can I resolve it > >> short of > >>>>>>>>>>> taking > >>>>>>>>>>>>>>>>>>>> down the > >>>>>>>>>>>>>>>>>>>>>>>>>>>> index > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> and scping the right version in? > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> MASTER: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Last Modified:about an hour ago > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Num Docs:164880 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Max Doc:164880 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Deleted Docs:0 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Version:2387 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Segment Count:23 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> REPLICA: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Last Modified: about an hour ago > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Num Docs:164773 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Max Doc:164773 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Deleted Docs:0 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Version:3001 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Segment Count:30 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the replicas log it says this: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Creating new http client, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >> > config:maxConnectionsPerHost=20&maxConnections=10000&connTimeout=30000&socketTimeout=30000&retry=false > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM > >>>>>>>> org.apache.solr.update.PeerSync > >>>>>>>>>>>>> sync > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> url= > http://10.38.33.17:7577/solrSTARTreplicas=[ > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > http://10.38.33.16:7575/solr/dsc-shard5-core1/ > >> ] > >>>>>>>>>>>>>>> nUpdates=100 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM > >>>>>>>> org.apache.solr.update.PeerSync > >>>>>>>>>>>>>>>>>>>>>>> handleVersions > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url= > >>>>>>>>>>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Received 100 versions from > >>>>>>>>>>>>>>>>>>>> 10.38.33.16:7575/solr/dsc-shard5-core1/ > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM > >>>>>>>> org.apache.solr.update.PeerSync > >>>>>>>>>>>>>>>>>>>>>>> handleVersions > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url= > >>>>>>>>>>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr Our > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> versions are newer. > >>>>>>>> ourLowThreshold=1431233788792274944 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> otherHigh=1431233789440294912 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM > >>>>>>>> org.apache.solr.update.PeerSync > >>>>>>>>>>>>> sync > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> url=http://10.38.33.17:7577/solrDONE. sync > >>>>>>>> succeeded > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> which again seems to point that it thinks it > >> has a > >>>>>>>>>>> newer > >>>>>>>>>>>>>>>>>>>> version of > >>>>>>>>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> index so it aborts. This happened while > >> having 10 > >>>>>>>>>>> threads > >>>>>>>>>>>>>>>>>>>> indexing > >>>>>>>>>>>>>>>>>>>>>>>>>>>> 10,000 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> items writing to a 6 shard (1 replica each) > >>>>>>>> cluster. > >>>>>>>>>>> Any > >>>>>>>>>>>>>>>>>>>> thoughts > >>>>>>>>>>>>>>>>>>>>>>> on > >>>>>>>>>>>>>>>>>>>>>>>>>>>> this > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> or what I should look for would be > appreciated. > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>> > >> > >> > > --e89a8ff1c9b8d801ed04d97e69b8--