lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <>
Subject Re: ScoreDoc
Date Sun, 09 Nov 2008 23:05:19 GMT
Excuse me. Some unchecked logic there concerning HitCollector. A 
HitCollector hits all matching documents, not all documents. Sometimes 
that can be a lot. With TopDocs, you only ask for the Top scoring 
documents, which is usually a lesser number than all matching docs, and 
generally what people are interested in rather than all matching docs. 
Sorry for the confusion there - need to double check what I write...

Mark Miller wrote:
> Their is definitely some stale javadoc in Lucene here and there. All 
> of what your talking about has been shaken up recently with the 
> deprecation of Hits. Hits used to pretty much be considered the 
> non-expert API, but its been tossed in favor of the TopDoc API's.
> The HitCollector stuff has been marked expert because a lot of people 
> get into trouble using something that hits every doc in the index on a 
> search, not just the matching docs from the search. If you don't 
> understand whats going on, you can, and many have, make some pretty 
> slow code. The expert stuff just means, understand whats going on 
> before you start to play here ;) I don't necessarily think it doesn't 
> belong in a tutorial - assuming the guy who wrote the tutorial 
> understood what he was doing.
> As for the stale java-doc though, I'm sure patches would be welcome ;) 
> Its a group of volunteers all scratching their own itches here, so its 
> likely you will find things like that. Best bet is to pitch in when 
> you see it, and I'm sure one of the commiters will apply your patch if 
> its appropriate.
> - Mark
> ChadDavis wrote:
>> In fact, the search method used to populate the collector used in that
>> sample code also claims to be low level.  It suggests using the
>> query ) method instead, but that method is 
>> deprecated.
>> Lower-level search API.
>>> HitCollector.collect(int,float) is called for every matching document.
>>> Applications should only use this if they need *all* of the matching
>>> documents. The high-level search API ( is 
>>> usually
>>> more efficient, as it skips non-high-scoring hits.
>>> Note: The score passed to this method is a raw score. In other 
>>> words, the
>>> score will not necessarily be a float whose value is between 0 and 1.
>> Is this just stale documentation ?
>> On Sun, Nov 9, 2008 at 3:28 PM, ChadDavis 
>> <>wrote:
>>> The sample code uses a ScoreDoc array to hold the hits.
>>>     ScoreDoc[] hits = collector.topDocs().scoreDocs;
>>> But the JavaDoc says "Expert: Returned by low-level search
>>> implementations."  Why would the tutorial sample code use an 
>>> "expert" api?

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message