lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jamie Johnson <jej2...@gmail.com>
Subject Re: Solr 4.2 Cloud Replication Replica has higher version than Master?
Date Thu, 04 Apr 2013 02:01:29 GMT
so something is still not right.  Things were going ok, but I'm seeing this
in the logs of several of the replicas

SEVERE: Unable to create core: dsc-shard3-core1
org.apache.solr.common.SolrException: Error opening new searcher
        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:822)
        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:618)
        at
org.apache.solr.core.CoreContainer.createFromZk(CoreContainer.java:967)
        at
org.apache.solr.core.CoreContainer.create(CoreContainer.java:1049)
        at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
        at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
        at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1435)
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1547)
        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:797)
        ... 13 more
Caused by: org.apache.solr.common.SolrException: Error opening Reader
        at
org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:172)
        at
org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:183)
        at
org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:179)
        at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1411)
        ... 15 more
Caused by: java.io.FileNotFoundException:
/cce2/solr/data/dsc-shard3-core1/index/_13x.si (No such file or directory)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216)
        at
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:193)
        at
org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232)
        at
org.apache.lucene.codecs.lucene40.Lucene40SegmentInfoReader.read(Lucene40SegmentInfoReader.java:50)
        at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:301)
        at
org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:56)
        at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
        at
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
        at
org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:88)
        at
org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:34)
        at
org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:169)
        ... 18 more



On Wed, Apr 3, 2013 at 8:54 PM, Jamie Johnson <jej2003@gmail.com> wrote:

> Thanks I will try that.
>
>
> On Wed, Apr 3, 2013 at 8:28 PM, Mark Miller <markrmiller@gmail.com> wrote:
>
>>
>>
>> On Apr 3, 2013, at 8:17 PM, Jamie Johnson <jej2003@gmail.com> wrote:
>>
>> > I am not using the concurrent low pause garbage collector, I could look
>> at
>> > switching, I'm assuming you're talking about adding
>> -XX:+UseConcMarkSweepGC
>> > correct?
>>
>> Right - if you don't do that, the default is almost always the throughput
>> collector (I've only seen OSX buck this trend when apple handled java).
>> That means stop the world garbage collections, so with larger heaps, that
>> can be a fair amount of time that no threads can run. It's not that great
>> for something as interactive as search generally is anyway, but it's always
>> not that great when added to heavy load and a 15 sec session timeout
>> between solr and zk.
>>
>>
>> The below is odd - a replica node is waiting for the leader to see it as
>> recovering and live - live means it has created an ephemeral node for that
>> Solr corecontainer in zk - it's very strange if that didn't happen, unless
>> this happened during shutdown or something.
>>
>> >
>> > I also just had a shard go down and am seeing this in the log
>> >
>> > SEVERE: org.apache.solr.common.SolrException: I was asked to wait on
>> state
>> > down for 10.38.33.17:7576_solr but I still do not see the requested
>> state.
>> > I see state: recovering live:false
>> >        at
>> >
>> org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler.java:890)
>> >        at
>> >
>> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:186)
>> >        at
>> >
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>> >        at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:591)
>> >        at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:192)
>> >        at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
>> >        at
>> >
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
>> >        at
>> >
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
>> >        at
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>> >        at
>> >
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
>> >        at
>> >
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>> >
>> > Nothing other than this in the log jumps out as interesting though.
>> >
>> >
>> > On Wed, Apr 3, 2013 at 7:47 PM, Mark Miller <markrmiller@gmail.com>
>> wrote:
>> >
>> >> This shouldn't be a problem though, if things are working as they are
>> >> supposed to. Another node should simply take over as the overseer and
>> >> continue processing the work queue. It's just best if you configure so
>> that
>> >> session timeouts don't happen unless a node is really down. On the
>> other
>> >> hand, it's nicer to detect that faster. Your tradeoff to make.
>> >>
>> >> - Mark
>> >>
>> >> On Apr 3, 2013, at 7:46 PM, Mark Miller <markrmiller@gmail.com> wrote:
>> >>
>> >>> Yeah. Are you using the concurrent low pause garbage collector?
>> >>>
>> >>> This means the overseer wasn't able to communicate with zk for 15
>> >> seconds - due to load or gc or whatever. If you can't resolve the root
>> >> cause of that, or the load just won't allow for it, next best thing
>> you can
>> >> do is raise it to 30 seconds.
>> >>>
>> >>> - Mark
>> >>>
>> >>> On Apr 3, 2013, at 7:41 PM, Jamie Johnson <jej2003@gmail.com> wrote:
>> >>>
>> >>>> I am occasionally seeing this in the log, is this just a timeout
>> issue?
>> >>>> Should I be increasing the zk client timeout?
>> >>>>
>> >>>> WARNING: Overseer cannot talk to ZK
>> >>>> Apr 3, 2013 11:14:25 PM
>> >>>> org.apache.solr.cloud.DistributedQueue$LatchChildWatcher process
>> >>>> INFO: Watcher fired on path: null state: Expired type None
>> >>>> Apr 3, 2013 11:14:25 PM
>> >> org.apache.solr.cloud.Overseer$ClusterStateUpdater
>> >>>> run
>> >>>> WARNING: Solr cannot talk to ZK, exiting Overseer main queue loop
>> >>>> org.apache.zookeeper.KeeperException$SessionExpiredException:
>> >>>> KeeperErrorCode = Session expired for /overseer/queue
>> >>>>      at
>> >>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>> >>>>      at
>> >>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>> >>>>      at
>> org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468)
>> >>>>      at
>> >>>>
>> >>
>> org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:236)
>> >>>>      at
>> >>>>
>> >>
>> org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:233)
>> >>>>      at
>> >>>>
>> >>
>> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65)
>> >>>>      at
>> >>>>
>> >>
>> org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:233)
>> >>>>      at
>> >>>>
>> >>
>> org.apache.solr.cloud.DistributedQueue.orderedChildren(DistributedQueue.java:89)
>> >>>>      at
>> >>>>
>> >>
>> org.apache.solr.cloud.DistributedQueue.element(DistributedQueue.java:131)
>> >>>>      at
>> >>>>
>> org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:326)
>> >>>>      at
>> >>>>
>> >>
>> org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:128)
>> >>>>      at java.lang.Thread.run(Thread.java:662)
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Wed, Apr 3, 2013 at 7:25 PM, Jamie Johnson <jej2003@gmail.com>
>> >> wrote:
>> >>>>
>> >>>>> just an update, I'm at 1M records now with no issues.  This looks
>> >>>>> promising as to the cause of my issues, thanks for the help.  Is the
>> >>>>> routing method with numShards documented anywhere?  I know
>> numShards is
>> >>>>> documented but I didn't know that the routing changed if you don't
>> >> specify
>> >>>>> it.
>> >>>>>
>> >>>>>
>> >>>>> On Wed, Apr 3, 2013 at 4:44 PM, Jamie Johnson <jej2003@gmail.com>
>> >> wrote:
>> >>>>>
>> >>>>>> with these changes things are looking good, I'm up to 600,000
>> >> documents
>> >>>>>> without any issues as of right now.  I'll keep going and add more
>> to
>> >> see if
>> >>>>>> I find anything.
>> >>>>>>
>> >>>>>>
>> >>>>>> On Wed, Apr 3, 2013 at 4:01 PM, Jamie Johnson <jej2003@gmail.com>
>> >> wrote:
>> >>>>>>
>> >>>>>>> ok, so that's not a deal breaker for me.  I just changed it to
>> match
>> >> the
>> >>>>>>> shards that are auto created and it looks like things are happy.
>> >> I'll go
>> >>>>>>> ahead and try my test to see if I can get things out of sync.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Wed, Apr 3, 2013 at 3:56 PM, Mark Miller <
>> markrmiller@gmail.com
>> >>> wrote:
>> >>>>>>>
>> >>>>>>>> I had thought you could - but looking at the code recently, I
>> don't
>> >>>>>>>> think you can anymore. I think that's a technical limitation more
>> >> than
>> >>>>>>>> anything though. When these changes were made, I think support
>> for
>> >> that was
>> >>>>>>>> simply not added at the time.
>> >>>>>>>>
>> >>>>>>>> I'm not sure exactly how straightforward it would be, but it
>> seems
>> >>>>>>>> doable - as it is, the overseer will preallocate shards when
>> first
>> >> creating
>> >>>>>>>> the collection - that's when they get named shard(n). There would
>> >> have to
>> >>>>>>>> be logic to replace shard(n) with the custom shard name when the
>> >> core
>> >>>>>>>> actually registers.
>> >>>>>>>>
>> >>>>>>>> - Mark
>> >>>>>>>>
>> >>>>>>>> On Apr 3, 2013, at 3:42 PM, Jamie Johnson <jej2003@gmail.com>
>> >> wrote:
>> >>>>>>>>
>> >>>>>>>>> answered my own question, it now says compositeId.  What is
>> >>>>>>>> problematic
>> >>>>>>>>> though is that in addition to my shards (which are say
>> >> jamie-shard1)
>> >>>>>>>> I see
>> >>>>>>>>> the solr created shards (shard1).  I assume that these were
>> created
>> >>>>>>>> because
>> >>>>>>>>> of the numShards param.  Is there no way to specify the names of
>> >> these
>> >>>>>>>>> shards?
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> On Wed, Apr 3, 2013 at 3:25 PM, Jamie Johnson <
>> jej2003@gmail.com>
>> >>>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>> ah interesting....so I need to specify num shards, blow out zk
>> and
>> >>>>>>>> then
>> >>>>>>>>>> try this again to see if things work properly now.  What is
>> really
>> >>>>>>>> strange
>> >>>>>>>>>> is that for the most part things have worked right and on
>> 4.2.1 I
>> >>>>>>>> have
>> >>>>>>>>>> 600,000 items indexed with no duplicates.  In any event I will
>> >>>>>>>> specify num
>> >>>>>>>>>> shards clear out zk and begin again.  If this works properly
>> what
>> >>>>>>>> should
>> >>>>>>>>>> the router type be?
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Wed, Apr 3, 2013 at 3:14 PM, Mark Miller <
>> >> markrmiller@gmail.com>
>> >>>>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> If you don't specify numShards after 4.1, you get an implicit
>> doc
>> >>>>>>>> router
>> >>>>>>>>>>> and it's up to you to distribute updates. In the past,
>> >> partitioning
>> >>>>>>>> was
>> >>>>>>>>>>> done on the fly - but for shard splitting and perhaps other
>> >>>>>>>> features, we
>> >>>>>>>>>>> now divvy up the hash range up front based on numShards and
>> store
>> >>>>>>>> it in
>> >>>>>>>>>>> ZooKeeper. No numShards is now how you take complete control
>> of
>> >>>>>>>> updates
>> >>>>>>>>>>> yourself.
>> >>>>>>>>>>>
>> >>>>>>>>>>> - Mark
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Apr 3, 2013, at 2:57 PM, Jamie Johnson <jej2003@gmail.com>
>> >>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> The router says "implicit".  I did start from a blank zk
>> state
>> >> but
>> >>>>>>>>>>> perhaps
>> >>>>>>>>>>>> I missed one of the ZkCLI commands?  One of my shards from
>> the
>> >>>>>>>>>>>> clusterstate.json is shown below.  What is the process that
>> >> should
>> >>>>>>>> be
>> >>>>>>>>>>> done
>> >>>>>>>>>>>> to bootstrap a cluster other than the ZkCLI commands I listed
>> >>>>>>>> above?  My
>> >>>>>>>>>>>> process right now is run those ZkCLI commands and then start
>> >> solr
>> >>>>>>>> on
>> >>>>>>>>>>> all of
>> >>>>>>>>>>>> the instances with a command like this
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> java -server -Dshard=shard5 -DcoreName=shard5-core1
>> >>>>>>>>>>>> -Dsolr.data.dir=/solr/data/shard5-core1
>> >>>>>>>>>>> -Dcollection.configName=solr-conf
>> >>>>>>>>>>>> -Dcollection=collection1
>> >>>>>>>> -DzkHost=so-zoo1:2181,so-zoo2:2181,so-zoo3:2181
>> >>>>>>>>>>>> -Djetty.port=7575 -DhostPort=7575 -jar start.jar
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> I feel like maybe I'm missing a step.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> "shard5":{
>> >>>>>>>>>>>>    "state":"active",
>> >>>>>>>>>>>>    "replicas":{
>> >>>>>>>>>>>>      "10.38.33.16:7575_solr_shard5-core1":{
>> >>>>>>>>>>>>        "shard":"shard5",
>> >>>>>>>>>>>>        "state":"active",
>> >>>>>>>>>>>>        "core":"shard5-core1",
>> >>>>>>>>>>>>        "collection":"collection1",
>> >>>>>>>>>>>>        "node_name":"10.38.33.16:7575_solr",
>> >>>>>>>>>>>>        "base_url":"http://10.38.33.16:7575/solr",
>> >>>>>>>>>>>>        "leader":"true"},
>> >>>>>>>>>>>>      "10.38.33.17:7577_solr_shard5-core2":{
>> >>>>>>>>>>>>        "shard":"shard5",
>> >>>>>>>>>>>>        "state":"recovering",
>> >>>>>>>>>>>>        "core":"shard5-core2",
>> >>>>>>>>>>>>        "collection":"collection1",
>> >>>>>>>>>>>>        "node_name":"10.38.33.17:7577_solr",
>> >>>>>>>>>>>>        "base_url":"http://10.38.33.17:7577/solr"}}}
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On Wed, Apr 3, 2013 at 2:40 PM, Mark Miller <
>> >> markrmiller@gmail.com
>> >>>>>>>>>
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> It should be part of your clusterstate.json. Some users have
>> >>>>>>>> reported
>> >>>>>>>>>>>>> trouble upgrading a previous zk install when this change
>> came.
>> >> I
>> >>>>>>>>>>>>> recommended manually updating the clusterstate.json to have
>> the
>> >>>>>>>> right
>> >>>>>>>>>>> info,
>> >>>>>>>>>>>>> and that seemed to work. Otherwise, I guess you have to
>> start
>> >>>>>>>> from a
>> >>>>>>>>>>> clean
>> >>>>>>>>>>>>> zk state.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> If you don't have that range information, I think there
>> will be
>> >>>>>>>>>>> trouble.
>> >>>>>>>>>>>>> Do you have an router type defined in the clusterstate.json?
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> - Mark
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On Apr 3, 2013, at 2:24 PM, Jamie Johnson <
>> jej2003@gmail.com>
>> >>>>>>>> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Where is this information stored in ZK?  I don't see it in
>> the
>> >>>>>>>> cluster
>> >>>>>>>>>>>>>> state (or perhaps I don't understand it ;) ).
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Perhaps something with my process is broken.  What I do
>> when I
>> >>>>>>>> start
>> >>>>>>>>>>> from
>> >>>>>>>>>>>>>> scratch is the following
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> ZkCLI -cmd upconfig ...
>> >>>>>>>>>>>>>> ZkCLI -cmd linkconfig ....
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> but I don't ever explicitly create the collection.  What
>> >> should
>> >>>>>>>> the
>> >>>>>>>>>>> steps
>> >>>>>>>>>>>>>> from scratch be?  I am moving from an unreleased snapshot
>> of
>> >> 4.0
>> >>>>>>>> so I
>> >>>>>>>>>>>>> never
>> >>>>>>>>>>>>>> did that previously either so perhaps I did create the
>> >>>>>>>> collection in
>> >>>>>>>>>>> one
>> >>>>>>>>>>>>> of
>> >>>>>>>>>>>>>> my steps to get this working but have forgotten it along
>> the
>> >> way.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 2:16 PM, Mark Miller <
>> >>>>>>>> markrmiller@gmail.com>
>> >>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Thanks for digging Jamie. In 4.2, hash ranges are
>> assigned up
>> >>>>>>>> front
>> >>>>>>>>>>>>> when a
>> >>>>>>>>>>>>>>> collection is created - each shard gets a range, which is
>> >>>>>>>> stored in
>> >>>>>>>>>>>>>>> zookeeper. You should not be able to end up with the same
>> id
>> >> on
>> >>>>>>>>>>>>> different
>> >>>>>>>>>>>>>>> shards - something very odd going on.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Hopefully I'll have some time to try and help you
>> reproduce.
>> >>>>>>>> Ideally
>> >>>>>>>>>>> we
>> >>>>>>>>>>>>>>> can capture it in a test case.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> - Mark
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> On Apr 3, 2013, at 1:13 PM, Jamie Johnson <
>> jej2003@gmail.com
>> >>>
>> >>>>>>>> wrote:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> no, my thought was wrong, it appears that even with the
>> >>>>>>>> parameter
>> >>>>>>>>>>> set I
>> >>>>>>>>>>>>>>> am
>> >>>>>>>>>>>>>>>> seeing this behavior.  I've been able to duplicate it on
>> >> 4.2.0
>> >>>>>>>> by
>> >>>>>>>>>>>>>>> indexing
>> >>>>>>>>>>>>>>>> 100,000 documents on 10 threads (10,000 each) when I get
>> to
>> >>>>>>>> 400,000
>> >>>>>>>>>>> or
>> >>>>>>>>>>>>>>> so.
>> >>>>>>>>>>>>>>>> I will try this on 4.2.1. to see if I see the same
>> behavior
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 12:37 PM, Jamie Johnson <
>> >>>>>>>> jej2003@gmail.com>
>> >>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Since I don't have that many items in my index I
>> exported
>> >> all
>> >>>>>>>> of
>> >>>>>>>>>>> the
>> >>>>>>>>>>>>>>> keys
>> >>>>>>>>>>>>>>>>> for each shard and wrote a simple java program that
>> checks
>> >> for
>> >>>>>>>>>>>>>>> duplicates.
>> >>>>>>>>>>>>>>>>> I found some duplicate keys on different shards, a grep
>> of
>> >> the
>> >>>>>>>>>>> files
>> >>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>> the keys found does indicate that they made it to the
>> wrong
>> >>>>>>>> places.
>> >>>>>>>>>>>>> If
>> >>>>>>>>>>>>>>> you
>> >>>>>>>>>>>>>>>>> notice documents with the same ID are on shard 3 and
>> shard
>> >> 5.
>> >>>>>>>> Is
>> >>>>>>>>>>> it
>> >>>>>>>>>>>>>>>>> possible that the hash is being calculated taking into
>> >>>>>>>> account only
>> >>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>> "live" nodes?  I know that we don't specify the
>> numShards
>> >>>>>>>> param @
>> >>>>>>>>>>>>>>> startup
>> >>>>>>>>>>>>>>>>> so could this be what is happening?
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> grep -c "7cd1a717-3d94-4f5d-bcb1-9d8a95ca78de" *
>> >>>>>>>>>>>>>>>>> shard1-core1:0
>> >>>>>>>>>>>>>>>>> shard1-core2:0
>> >>>>>>>>>>>>>>>>> shard2-core1:0
>> >>>>>>>>>>>>>>>>> shard2-core2:0
>> >>>>>>>>>>>>>>>>> shard3-core1:1
>> >>>>>>>>>>>>>>>>> shard3-core2:1
>> >>>>>>>>>>>>>>>>> shard4-core1:0
>> >>>>>>>>>>>>>>>>> shard4-core2:0
>> >>>>>>>>>>>>>>>>> shard5-core1:1
>> >>>>>>>>>>>>>>>>> shard5-core2:1
>> >>>>>>>>>>>>>>>>> shard6-core1:0
>> >>>>>>>>>>>>>>>>> shard6-core2:0
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 10:42 AM, Jamie Johnson <
>> >>>>>>>> jej2003@gmail.com>
>> >>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Something interesting that I'm noticing as well, I just
>> >>>>>>>> indexed
>> >>>>>>>>>>>>> 300,000
>> >>>>>>>>>>>>>>>>>> items, and some how 300,020 ended up in the index.  I
>> >> thought
>> >>>>>>>>>>>>> perhaps I
>> >>>>>>>>>>>>>>>>>> messed something up so I started the indexing again and
>> >>>>>>>> indexed
>> >>>>>>>>>>>>> another
>> >>>>>>>>>>>>>>>>>> 400,000 and I see 400,064 docs.  Is there a good way to
>> >> find
>> >>>>>>>>>>>>> possibile
>> >>>>>>>>>>>>>>>>>> duplicates?  I had tried to facet on key (our id field)
>> >> but
>> >>>>>>>> that
>> >>>>>>>>>>>>> didn't
>> >>>>>>>>>>>>>>>>>> give me anything with more than a count of 1.
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 9:22 AM, Jamie Johnson <
>> >>>>>>>> jej2003@gmail.com>
>> >>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Ok, so clearing the transaction log allowed things to
>> go
>> >>>>>>>> again.
>> >>>>>>>>>>> I
>> >>>>>>>>>>>>> am
>> >>>>>>>>>>>>>>>>>>> going to clear the index and try to replicate the
>> >> problem on
>> >>>>>>>>>>> 4.2.0
>> >>>>>>>>>>>>>>> and then
>> >>>>>>>>>>>>>>>>>>> I'll try on 4.2.1
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 8:21 AM, Mark Miller <
>> >>>>>>>>>>> markrmiller@gmail.com
>> >>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> No, not that I know if, which is why I say we need to
>> >> get
>> >>>>>>>> to the
>> >>>>>>>>>>>>>>> bottom
>> >>>>>>>>>>>>>>>>>>>> of it.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> - Mark
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 10:18 PM, Jamie Johnson <
>> >>>>>>>> jej2003@gmail.com>
>> >>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> Mark
>> >>>>>>>>>>>>>>>>>>>>> It's there a particular jira issue that you think
>> may
>> >>>>>>>> address
>> >>>>>>>>>>>>> this?
>> >>>>>>>>>>>>>>> I
>> >>>>>>>>>>>>>>>>>>>> read
>> >>>>>>>>>>>>>>>>>>>>> through it quickly but didn't see one that jumped
>> out
>> >>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013 10:07 PM, "Jamie Johnson" <
>> >>>>>>>> jej2003@gmail.com>
>> >>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> I brought the bad one down and back up and it did
>> >>>>>>>> nothing.  I
>> >>>>>>>>>>> can
>> >>>>>>>>>>>>>>>>>>>> clear
>> >>>>>>>>>>>>>>>>>>>>>> the index and try4.2.1. I will save off the logs
>> and
>> >> see
>> >>>>>>>> if
>> >>>>>>>>>>> there
>> >>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>>>>> anything else odd
>> >>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013 9:13 PM, "Mark Miller" <
>> >>>>>>>> markrmiller@gmail.com>
>> >>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> It would appear it's a bug given what you have
>> said.
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Any other exceptions would be useful. Might be
>> best
>> >> to
>> >>>>>>>> start
>> >>>>>>>>>>>>>>>>>>>> tracking in
>> >>>>>>>>>>>>>>>>>>>>>>> a JIRA issue as well.
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> To fix, I'd bring the behind node down and back
>> >> again.
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Unfortunately, I'm pressed for time, but we really
>> >> need
>> >>>>>>>> to
>> >>>>>>>>>>> get
>> >>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>> bottom of this and fix it, or determine if it's
>> >> fixed in
>> >>>>>>>>>>> 4.2.1
>> >>>>>>>>>>>>>>>>>>>> (spreading
>> >>>>>>>>>>>>>>>>>>>>>>> to mirrors now).
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> - Mark
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 7:21 PM, Jamie Johnson <
>> >>>>>>>> jej2003@gmail.com
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> Sorry I didn't ask the obvious question.  Is
>> there
>> >>>>>>>> anything
>> >>>>>>>>>>>>> else
>> >>>>>>>>>>>>>>>>>>>> that I
>> >>>>>>>>>>>>>>>>>>>>>>>> should be looking for here and is this a bug?
>>  I'd
>> >> be
>> >>>>>>>> happy
>> >>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>> troll
>> >>>>>>>>>>>>>>>>>>>>>>>> through the logs further if more information is
>> >>>>>>>> needed, just
>> >>>>>>>>>>>>> let
>> >>>>>>>>>>>>>>> me
>> >>>>>>>>>>>>>>>>>>>>>>> know.
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> Also what is the most appropriate mechanism to
>> fix
>> >>>>>>>> this.
>> >>>>>>>>>>> Is it
>> >>>>>>>>>>>>>>>>>>>>>>> required to
>> >>>>>>>>>>>>>>>>>>>>>>>> kill the index that is out of sync and let solr
>> >> resync
>> >>>>>>>>>>> things?
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson <
>> >>>>>>>>>>>>> jej2003@gmail.com
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> sorry for spamming here....
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> shard5-core2 is the instance we're having issues
>> >>>>>>>> with...
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM
>> >>>>>>>> org.apache.solr.common.SolrException
>> >>>>>>>>>>>>> log
>> >>>>>>>>>>>>>>>>>>>>>>>>> SEVERE: shard update error StdNode:
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>
>> >>
>> http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException
>> >>>>>>>>>>>>>>>>>>>>>>> :
>> >>>>>>>>>>>>>>>>>>>>>>>>> Server at
>> >>>>>>>>>>>>> http://10.38.33.17:7577/solr/dsc-shard5-core2returned
>> >>>>>>>>>>>>>>>>>>>> non
>> >>>>>>>>>>>>>>>>>>>>>>> ok
>> >>>>>>>>>>>>>>>>>>>>>>>>> status:503, message:Service Unavailable
>> >>>>>>>>>>>>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>
>> >>
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
>> >>>>>>>>>>>>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>
>> >>
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>> >>>>>>>>>>>>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>
>> >>
>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
>> >>>>>>>>>>>>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>
>> >>
>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
>> >>>>>>>>>>>>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> >>>>>>>>>>>>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> >>>>>>>>>>>>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>
>> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>> >>>>>>>>>>>>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> >>>>>>>>>>>>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> >>>>>>>>>>>>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> >>>>>>>>>>>>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> >>>>>>>>>>>>>>>>>>>>>>>>> at java.lang.Thread.run(Thread.java:662)
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson <
>> >>>>>>>>>>>>>>> jej2003@gmail.com>
>> >>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> here is another one that looks interesting
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM
>> >>>>>>>>>>> org.apache.solr.common.SolrException
>> >>>>>>>>>>>>> log
>> >>>>>>>>>>>>>>>>>>>>>>>>>> SEVERE: org.apache.solr.common.SolrException:
>> >>>>>>>> ClusterState
>> >>>>>>>>>>>>> says
>> >>>>>>>>>>>>>>>>>>>> we are
>> >>>>>>>>>>>>>>>>>>>>>>>>>> the leader, but locally we don't think so
>> >>>>>>>>>>>>>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>
>> >>
>> org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293)
>> >>>>>>>>>>>>>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>
>> >>
>> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228)
>> >>>>>>>>>>>>>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>
>> >>
>> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339)
>> >>>>>>>>>>>>>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>
>> >>
>> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
>> >>>>>>>>>>>>>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>
>> >>
>> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
>> >>>>>>>>>>>>>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
>> >>>>>>>>>>>>>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>
>> >>
>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>> >>>>>>>>>>>>>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>
>> >>
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>> >>>>>>>>>>>>>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>
>> >>
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>> >>>>>>>>>>>>>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>>>>>>>
>> >> org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
>> >>>>>>>>>>>>>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>
>> >>
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
>> >>>>>>>>>>>>>>>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>
>> >>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson <
>> >>>>>>>>>>>>>>> jej2003@gmail.com
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Looking at the master it looks like at some
>> point
>> >>>>>>>> there
>> >>>>>>>>>>> were
>> >>>>>>>>>>>>>>>>>>>> shards
>> >>>>>>>>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> went down.  I am seeing things like what is
>> >> below.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> NFO: A cluster state change: WatchedEvent
>> >>>>>>>>>>>>> state:SyncConnected
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> type:NodeChildrenChanged path:/live_nodes, has
>> >>>>>>>> occurred -
>> >>>>>>>>>>>>>>>>>>>>>>> updating... (live
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> nodes size: 12)
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
>> >>>>>>>>>>>>>>>>>>>> org.apache.solr.common.cloud.ZkStateReader$3
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> process
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Updating live nodes... (9)
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
>> >>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> runLeaderProcess
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Running the leader process.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
>> >>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> shouldIBeLeader
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Checking if I should try and be the
>> leader.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
>> >>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> shouldIBeLeader
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: My last published State was Active, it's
>> >> okay
>> >>>>>>>> to be
>> >>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>> leader.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52 PM
>> >>>>>>>>>>>>>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> runLeaderProcess
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: I may be the new leader - try and sync
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller <
>> >>>>>>>>>>>>>>>>>>>> markrmiller@gmail.com
>> >>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't think the versions you are thinking
>> of
>> >>>>>>>> apply
>> >>>>>>>>>>> here.
>> >>>>>>>>>>>>>>>>>>>> Peersync
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> does not look at that - it looks at version
>> >>>>>>>> numbers for
>> >>>>>>>>>>>>>>>>>>>> updates in
>> >>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> transaction log - it compares the last 100 of
>> >> them
>> >>>>>>>> on
>> >>>>>>>>>>>>> leader
>> >>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>> replica.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> What it's saying is that the replica seems to
>> >> have
>> >>>>>>>>>>> versions
>> >>>>>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>>>>> the leader
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> does not. Have you scanned the logs for any
>> >>>>>>>> interesting
>> >>>>>>>>>>>>>>>>>>>> exceptions?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Did the leader change during the heavy
>> indexing?
>> >>>>>>>> Did
>> >>>>>>>>>>> any zk
>> >>>>>>>>>>>>>>>>>>>> session
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> timeouts occur?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> - Mark
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Apr 2, 2013, at 4:52 PM, Jamie Johnson <
>> >>>>>>>>>>>>> jej2003@gmail.com
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am currently looking at moving our Solr
>> >> cluster
>> >>>>>>>> to
>> >>>>>>>>>>> 4.2
>> >>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>> noticed a
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> strange issue while testing today.
>> >> Specifically
>> >>>>>>>> the
>> >>>>>>>>>>>>> replica
>> >>>>>>>>>>>>>>>>>>>> has a
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> higher
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> version than the master which is causing the
>> >>>>>>>> index to
>> >>>>>>>>>>> not
>> >>>>>>>>>>>>>>>>>>>>>>> replicate.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Because of this the replica has fewer
>> documents
>> >>>>>>>> than
>> >>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>> master.
>> >>>>>>>>>>>>>>>>>>>>>>> What
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> could cause this and how can I resolve it
>> >> short of
>> >>>>>>>>>>> taking
>> >>>>>>>>>>>>>>>>>>>> down the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> index
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> and scping the right version in?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> MASTER:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Last Modified:about an hour ago
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Num Docs:164880
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Max Doc:164880
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Deleted Docs:0
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Version:2387
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Segment Count:23
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> REPLICA:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Last Modified: about an hour ago
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Num Docs:164773
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Max Doc:164773
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Deleted Docs:0
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Version:3001
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Segment Count:30
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> in the replicas log it says this:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: Creating new http client,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>
>> >>
>> config:maxConnectionsPerHost=20&maxConnections=10000&connTimeout=30000&socketTimeout=30000&retry=false
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM
>> >>>>>>>> org.apache.solr.update.PeerSync
>> >>>>>>>>>>>>> sync
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> url=
>> http://10.38.33.17:7577/solrSTARTreplicas=[
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> http://10.38.33.16:7575/solr/dsc-shard5-core1/
>> >> ]
>> >>>>>>>>>>>>>>> nUpdates=100
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM
>> >>>>>>>> org.apache.solr.update.PeerSync
>> >>>>>>>>>>>>>>>>>>>>>>> handleVersions
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url=
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Received 100 versions from
>> >>>>>>>>>>>>>>>>>>>> 10.38.33.16:7575/solr/dsc-shard5-core1/
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM
>> >>>>>>>> org.apache.solr.update.PeerSync
>> >>>>>>>>>>>>>>>>>>>>>>> handleVersions
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2 url=
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr  Our
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> versions are newer.
>> >>>>>>>> ourLowThreshold=1431233788792274944
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> otherHigh=1431233789440294912
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Apr 2, 2013 8:15:06 PM
>> >>>>>>>> org.apache.solr.update.PeerSync
>> >>>>>>>>>>>>> sync
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> INFO: PeerSync: core=dsc-shard5-core2
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> url=http://10.38.33.17:7577/solrDONE. sync
>> >>>>>>>> succeeded
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> which again seems to point that it thinks it
>> >> has a
>> >>>>>>>>>>> newer
>> >>>>>>>>>>>>>>>>>>>> version of
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> index so it aborts.  This happened while
>> >> having 10
>> >>>>>>>>>>> threads
>> >>>>>>>>>>>>>>>>>>>> indexing
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> 10,000
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> items writing to a 6 shard (1 replica each)
>> >>>>>>>> cluster.
>> >>>>>>>>>>> Any
>> >>>>>>>>>>>>>>>>>>>> thoughts
>> >>>>>>>>>>>>>>>>>>>>>>> on
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> this
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> or what I should look for would be
>> appreciated.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>
>> >>
>> >>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message