lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Documents disappearing
Date Sat, 20 Feb 2010 02:21:46 GMT
Pascal,

Look at that difference between numDocs and maxDocs.  That delta represents deleted docs.
 Maybe there is something deleting your docs after all!

Otis
----Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



----- Original Message ----
> From: Pascal Dimassimo <thesuperdim@hotmail.com>
> To: solr-user@lucene.apache.org
> Sent: Fri, February 19, 2010 3:50:26 PM
> Subject: RE: Documents disappearing
> 
> 
> Using LukeRequestHandler, I see:
> 
> 7725
> 28099
> 758826
> 1266355690710
> false
> true
> true
> 
> org.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/opt/solr/myindex/data/index
> 
> 
> I will copy the index to my local machine so I can open it with luke. Should
> I look for something specific?
> 
> Thanks!
> 
> 
> ANKITBHATNAGAR wrote:
> > 
> > Try inspecting your index with luke
> > 
> > 
> > Ankit
> > 
> > 
> > -----Original Message-----
> > From: Pascal Dimassimo [mailto:thesuperdim@hotmail.com] 
> > Sent: Friday, February 19, 2010 2:22 PM
> > To: solr-user@lucene.apache.org
> > Subject: Documents disappearing
> > 
> > 
> > Hi,
> > 
> > I have encounter a situation that I can't explain. We are indexing
> > documents
> > that are often duplicates so we activated deduplication like this:
> > 
> > 
> > class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
> >      true
> >      true
> >      signature
> >      title,text
> >      
> > name="signatureClass">org.apache.solr.update.processor.Lookup3Signature
> > 
> > 
> > What I can't explain is that when I look at the documents count in the
> > log,
> > I see documents disappearing.
> > 
> > 11:24:23 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=0 status=0 QTime=0
> > 14:04:24 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=4065 status=0 QTime=10
> > 14:17:07 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=6499 status=0 QTime=42
> > 14:25:42 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=7629 status=0 QTime=1
> > 14:47:12 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=10140 status=0 QTime=12
> > 15:17:22 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=10861 status=0 QTime=13
> > 15:47:31 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=9852 status=0 QTime=19
> > 16:17:42 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=8112 status=0 QTime=13
> > 16:38:17 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=10
> > 16:39:10 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=1
> > 16:47:40 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=46
> > 16:51:24 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=74
> > 17:02:13 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=102
> > 17:17:41 INFO  - [myindex] webapp=null path=null
> > params={event=newSearcher&q=*:*&wt=dismax} hits=7725 status=0 QTime=8
> > 
> > 11:24 was the time at which Solr was started that day. Around 13:30, we
> > started the indexation.
> > 
> > At some point during the indexation, I notice that a batch a documents
> > were
> > resend (i.e, documents with the same id field were sent again to the
> > index).
> > And according to the log, NO delete was sent to Solr.
> > 
> > I understand that if I send duplicates (either documents with the same id
> > or
> > with the same signature), the count of documents should stay the same. But
> > how can we explain that it is lowering? What are the possible causes of
> > this
> > behavior?
> > 
> > Thanks! 
> > -- 
> > View this message in context:
> > http://old.nabble.com/Documents-disappearing-tp27659047p27659047.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> > 
> > 
> > 
> 
> -- 
> View this message in context: 
> http://old.nabble.com/Documents-disappearing-tp27659047p27660077.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Mime
View raw message