lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geary, Frank" <frank.ge...@zoominfo.com>
Subject RE: Solr 4.5.0 replication numDocs larger in slave
Date Tue, 04 Mar 2014 20:00:41 GMT
Here's what I believe is my solution:
 
Yesterday I changed "nrtMode" to false in my solrconfig.xml (see the example solrconfig.xml
for more info) on each master and slave server.  And as of today the numDocs are the same
in each master/slave pair - but I'll continue watching this for a bit.  

Anyway, I believe that the numDocs in the master was jumping ahead of the slave due to the
nrtMode being set to true (which is the default).  Having nrtMode set to true causes the IndexReaders
to be reopened from the IndexWriter after the commit, and thus, if my guess is right, the
IndexWriter is effectively soft committing some adds and deletes on a normal basis, even though
I did not explicitly turn on any soft committing, that I know of.  Then anytime an IndexerReader
is reopened based on the InderWriter, you will see those soft commits.  But setting nrtMode
to false causes the IndexReaders to be reopened from the Directory which will never see those
soft commits - only the hard commits.  And of course on the slave side, after replication,
the slave never sees any soft commits, only hard commits from the Directory.

Frank 

-----Original Message-----
From: Geary, Frank 
Sent: Monday, March 03, 2014 12:10 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr 4.5.0 replication numDocs larger in slave

Thanks Greg.  We optimize the master once a week (early in the day Sunday) and we do not do
a commit Sunday evening (the only evening of the week when we do not commit).  So now after
optimization/replication the master/slave pair that were out on sync on Friday now have the
same numDocs (and every other value on the Overview page agrees except "size" under Replication
where it shows the slave is smaller).  Unfortunately, a different master/slave pair now have
different numDocs after the optimize and replication done yesterday.  

For the newly out of sync master/slave pair, the Version (Under Statistics on the Overview
page) is 4 revisions earlier on the slave than on the master and there are two fewer segments
on the slave than there are on the master.   Under Replication on the Overview page, the Versions
and Gen's are all the same, but the size of the slave is smaller than the master.  The slave
has 51 fewer documents than the master.   But indexing is continuing on the master (but no
commit has happened since the optimization early Sunday.)

I wonder if this is related to the NRT functionality in some way.  I see "Impl: org.apache.solr.core.NRTCachingDirectoryFactory"
on the Overview page.  I've been trying to rely on default behavior whenever possible.  But
perhaps I need to turn something off? 

Frank

-----Original Message-----
From: Greg Walters [mailto:greg.walters@answers.com]
Sent: Monday, March 03, 2014 10:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.5.0 replication numDocs larger in slave

I just ran into an issue similar to this that effected document scores on distributed searches.
You might try doing an optimize and purging your deleted documents while no indexing is being
done then checking your counts. Once I optimized all my indexes the document counts on all
of my cores matched up and scoring was consistent.

Thanks,
Greg

On Feb 28, 2014, at 8:22 PM, Erick Erickson <erickerickson@gmail.com> wrote:

> That really shouldn't be happening IF indexing is shut off. Otherwise 
> the slave is taking a snapshot of the master index and synching.
> 
> bq: The slave has about 33 more documents and one fewer segements 
> (according to Overview in solr admin
> 
> Sounds like the master is still indexing and you've deleted documents 
> on the master.
> 
> Best,
> Erick
> 
> 
> On Fri, Feb 28, 2014 at 11:08 AM, Geary, Frank <frank.geary@zoominfo.com>wrote:
> 
>> Hi,
>> 
>> I'm using Solr 4.5.0, I have a single master replicating to a single 
>> slave.  Only the master is being indexed to - never the slave.  The 
>> master is committed once each night.  After the first commit and 
>> replication the numDoc counts are identical.  After the next nightly 
>> commit and after the second replication a few minutes later, the 
>> numDocs has increased in both the master and the slave as expected, 
>> but numDocs is not the same in the master as it is in the slave.  The 
>> slave has about 33 more documents and one fewer segements (according to Overview
in solr admin).
>> 
>> I suspect the numDocs may be in sync again after tonight, but can anyone
>> explain what is going on here?   Is it possible a few deletions got
>> committed to the master but not replicated to the slave?
>> 
>> Thanks
>> 
>> Frank
>> 
>> 
>> 


Mime
View raw message