lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <>
Subject Re: Fastest Method for Searching (need all results)
Date Fri, 21 Jul 2006 18:59:20 GMT
Ryan O'Hara wrote:
> My index contains approximately 5 millions documents.  During a 
> search, I need to grab the value of a field for every document in the 
> result set.  I am currently using a HitCollector to search.  Below is 
> my code:
>, new HitCollector(){
>                         public void collect(int doc, float score){
>                                 if(searcher.doc(doc).get("SYM") != null){
> addSymbolsToHash(searcher.doc(doc).get("SYM").split("ENDOFSYM"));
>                                 }
>                         }
>                     });
> This is fairly fast for small and medium-sized result sets.  However, 
> it gets slow as the result set grows.  I read this on HitCollector's 
> API page:
> "For good search performance, implementations of this method should 
> not call Searcher.doc(int) or Reader.document(int) on every document 
> number encountered. Doing so can slow searches by an order of 
> magnitude or more."
> Along with this implementation, I've also tried using FieldCache.  
> This faired better with large-sized result sets, but worse with small 
> and medium-sized result sets.  Anyone have any ideas of what the best 
> approach might be?
> Thanks a lot,
> Ryan
Perhaps I am speaking too quickly, but I would try by not grabbing the 
value of the field for every document in the results set. Someone will 
see that value or use it for a couple million hits? Could be I 
suppose...but if not than axe it. Grab the first few thousand (or MUCH 
less) and if they need more head back in and grab more.

- mark

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message