lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Migrating from Hit/Hits to TopDocs/TopDocCollector
Date Wed, 10 Jun 2009 10:17:00 GMT
> You are wrong.
> As the java doc reads: 'Finds the top n hits for query'
> You can set n to whatever value you want, 'all' documents (not results!)
> indexed in your index if you want, or 10 if you want the top 10.

You are right, you can, but if you just want to retrieve all hits, this is
ineffective. A HitCollector is the correct way to do this (especially
because the order of hits is mostly not interesting when retrieving all
hits). Hits and TopDocs are intended for paged results lists.

> Anyway, it's just an example to give a direction..

Same here,
I wanted to give Paul a hint, how to do it correctly and effective.

> Wouter
> 
> > This code snipplet would only work, if you want to iterate over e.g. the
> > first 20 documents (which is n in your code). If he wants to iterate
> over
> > all results, he should think about using a custom (Hit)Collector.
> >
> > The code below will be very slow for large result sets (because
> retrieving
> > stored fields is not effective for a large number of documents, look
> into
> > the warning about the "inner search loop" in Wiki). To just retrieve
> e.g.
> > a
> > Filename, it may really be better to use a FieldCache on the "FILE"
> field
> > and inside the HitCollector, use the doc number to get the filename from
> > the
> > cache. I think the speed improve will be >>10 times as fast!
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >> -----Original Message-----
> >> From: Wouter Heijke [mailto:wheijke@xs4all.nl]
> >> Sent: Wednesday, June 10, 2009 11:44 AM
> >> To: java-user@lucene.apache.org
> >> Subject: Re: Migrating from Hit/Hits to TopDocs/TopDocCollector
> >>
> >>
> >> Will this do?
> >>
> >> IndexReader indexReader = searcher.getIndexReader();
> >> TopDocs topDocs = searcher.search(Query query, int n);
> >> for (int i = 0; i < topDocs.scoreDocs.length; i++) {
> >>   Document document = indexReader.document( topDocs.scoreDocs[i].doc);
> >>   final File f = new File( document.get( "FILE" ) );
> >> }
> >>
> >>
> >> > I have existing code that's like:
> >> >
> >> > 	final Term t = /* ... */;
> >> >          final Iterator i = searcher.search( new
> >> > TermQuery( t ) ).iterator();
> >> >          while ( i.hasNext() ) {
> >> >              final Hit hit = (Hit)i.next();
> >> > 	    // "FILE" is the field that recorded the original file indexed
> >> >              final File f = new File( hit.get( "FILE" ) );
> >> > 	    // ...
> >> >          }
> >> >
> >> > It's not clear to me how to rewrite the code using TopDocs/
> >> > TopDocCollector and how to iterate over the results.
> >> >
> >> > A little help?  Thanks.  :-)
> >> >
> >> > - Paul
> >> >
> >>
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message