lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankit Bhatnagar <abhatna...@vantage.com>
Subject RE: Documents disappearing
Date Fri, 19 Feb 2010 20:27:57 GMT
Try inspecting your index with luke


Ankit


-----Original Message-----
From: Pascal Dimassimo [mailto:thesuperdim@hotmail.com] 
Sent: Friday, February 19, 2010 2:22 PM
To: solr-user@lucene.apache.org
Subject: Documents disappearing


Hi,

I have encounter a situation that I can't explain. We are indexing documents
that are often duplicates so we activated deduplication like this:

<processor
class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
      <bool name="enabled">true</bool>
      <bool name="overwriteDupes">true</bool>
      <str name="signatureField">signature</str>
      <str name="fields">title,text</str>
      <str
name="signatureClass">org.apache.solr.update.processor.Lookup3Signature</str>
</processor>

What I can't explain is that when I look at the documents count in the log,
I see documents disappearing.

11:24:23 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=0 status=0 QTime=0
14:04:24 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=4065 status=0 QTime=10
14:17:07 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=6499 status=0 QTime=42
14:25:42 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=7629 status=0 QTime=1
14:47:12 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=10140 status=0 QTime=12
15:17:22 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=10861 status=0 QTime=13
15:47:31 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=9852 status=0 QTime=19
16:17:42 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=8112 status=0 QTime=13
16:38:17 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=10
16:39:10 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=1
16:47:40 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=46
16:51:24 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=74
17:02:13 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=102
17:17:41 INFO  - [myindex] webapp=null path=null
params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=8

11:24 was the time at which Solr was started that day. Around 13:30, we
started the indexation.

At some point during the indexation, I notice that a batch a documents were
resend (i.e, documents with the same id field were sent again to the index).
And according to the log, NO delete was sent to Solr.

I understand that if I send duplicates (either documents with the same id or
with the same signature), the count of documents should stay the same. But
how can we explain that it is lowering? What are the possible causes of this
behavior?

Thanks! 
-- 
View this message in context: http://old.nabble.com/Documents-disappearing-tp27659047p27659047.html
Sent from the Solr - User mailing list archive at Nabble.com.


Mime
View raw message