lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: SolrCloud High Availability during indexing operation
Date Wed, 09 Oct 2013 02:39:43 GMT
The attachment did not go through - try using pastebin.com or something.

Are you adding docs with curl one at a time or in bulk per request.

- Mark

On Oct 8, 2013, at 9:58 PM, Saurabh Saxena <ssaxena@gopivotal.com> wrote:

> Repeated the experiments on local system. Single shard Solrcloud with a replica. Tried
to index 10K docs. All the indexing operation were redirected to replica Solr node. While
the document while getting indexed on replica, I shutdown the leader Solr node. Out of 10K
docs, only 9900 docs got indexed. If I repeat the experiment without shutting down the leader
instance, all 10K docs get indexed. I am using curl to upload the docs, there was no curl
error while uploading documents. 
> 
> Following error was there in replica log file. 
> 
> ERROR - 2013-10-08 16:10:32.662; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException:
No registered leader was found, collection:test_collection slice:shard1
> 
> Attached replica log file. 
> 
> 
> On Thu, Sep 26, 2013 at 7:15 PM, Saurabh Saxena <ssaxena@gopivotal.com> wrote:
> Sorry for the late reply.
> 
> All the documents have unique id. If I repeat the experiment, the num of docs indexed
changes (I guess it depends when I shutdown a particular shard). When I do the experiment
without shutting down leader Shards, all 80k docs get indexed (which I think proves that all
documents are valid). 
> 
> I need to dig the logs to find error message. Also, I am not tracking of curl return
code, will run again and reply.
> 
> Regards,
> Saurabh 
> 
> 
> On Wed, Sep 25, 2013 at 3:01 AM, Erick Erickson <erickerickson@gmail.com> wrote:
> And do any of the documents have the same <uniqueKey>, which
> is usually called "id"? Subsequent adds of docs with the same
> <uniqueKey> replace the earlier one.
> 
> It's not definitive because it changes as merges happen, old copies
> of docs that have been deleted or updated will be purged, but what
> does your admin page show for "maxDoc"? If it's more than "numDocs"
> then you have duplicate <uniqueKey>s. NOTE: if you optimize
> (which you usually shouldn't) then maxDoc and numDocs will be
> the same so if you test this don't optimize.
> 
> Best,
> Erick
> 
> 
> On Tue, Sep 24, 2013 at 10:43 AM, Walter Underwood
> <wunder@wunderwood.org> wrote:
> > Did all of the curl update commands return success? Ane errors in the logs?
> >
> > wunder
> >
> > On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote:
> >
> >> Is it possible that some of those 80K docs were simply not valid? e.g.
> >> had a wrong field, had a missing required field, anything like that?
> >> What happens if you clear this collection and just re-run the same
> >> indexing process and do everything else the same?  Still some docs
> >> missing?  Same number?
> >>
> >> And what if you take 1 document that you know is valid and index it
> >> 80K times, with a different ID, of course?  Do you see 80K docs in the
> >> end?
> >>
> >> Otis
> >> --
> >> Solr & ElasticSearch Support -- http://sematext.com/
> >> Performance Monitoring -- http://sematext.com/spm
> >>
> >>
> >>
> >> On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena <ssaxena@gopivotal.com>
wrote:
> >>> Doc count did not change after I restarted the nodes. I am doing a single
> >>> commit after all 80k docs. Using Solr 4.4.
> >>>
> >>> Regards,
> >>> Saurabh
> >>>
> >>>
> >>> On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic <
> >>> otis.gospodnetic@gmail.com> wrote:
> >>>
> >>>> Interesting. Did the doc count change after you started the nodes again?
> >>>> Can you tell us about commits?
> >>>> Which version? 4.5 will be out soon.
> >>>>
> >>>> Otis
> >>>> Solr & ElasticSearch Support
> >>>> http://sematext.com/
> >>>> On Sep 23, 2013 8:37 PM, "Saurabh Saxena" <ssaxena@gopivotal.com>
wrote:
> >>>>
> >>>>> Hello,
> >>>>>
> >>>>> I am testing High Availability feature of SolrCloud. I am using
the
> >>>>> following setup
> >>>>>
> >>>>> - 8 linux hosts
> >>>>> - 8 Shards
> >>>>> - 1 leader, 1 replica / host
> >>>>> - Using Curl for update operation
> >>>>>
> >>>>> I tried to index 80K documents on replicas (10K/replica in parallel).
> >>>>> During indexing process, I stopped 4 Leader nodes. Once indexing
is done,
> >>>>> out of 80K docs only 79808 docs are indexed.
> >>>>>
> >>>>> Is this an expected behaviour ? In my opinion replica should take
care of
> >>>>> indexing if leader is down.
> >>>>>
> >>>>> If this is an expected behaviour, any steps that can be taken from
the
> >>>>> client side to avoid such a situation.
> >>>>>
> >>>>> Regards,
> >>>>> Saurabh Saxena
> >>>>>
> >>>>
> >
> > --
> > Walter Underwood
> > wunder@wunderwood.org
> >
> >
> >
> 
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message