lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DM Smith <dmsmith...@gmail.com>
Subject Re: Searcher javadoc problem
Date Sun, 04 Oct 2009 02:48:09 GMT
On Oct 3, 2009, at 9:23 PM, Mark Miller <markrmiller@gmail.com> wrote:

> Gotchya - that clears up my mind. I know your an advanced user, so it
> threw me for a loop that you would be using Hits like a Collector.  
> Just
> have been seeing that a lot lately.

Is there enough interest to add a new search method?  (Hiterator??  
Maybe a parameter on a Collector??) It would return a stream of hits,  
one each on the call to next. I guess it should take a Filter. No  
assumption on order in the abstract. An implemention can define an  
order. In my case it would be doc order when not parallel.

Btw the use case parallels a lookup on an RDBMS table: Find all  
matching records and let the app handle the ordering and slicing.

--DM


>
> Just read to much into: So what is the appropriate documentation for
> getting all "hits"?
>
> Another option (of course) is to maintain your own Hits class. Sounds
> like working up something with a Collector on your own would be better
> though - why compute the score if you don't need it. Hits caching was
> rarely that useful either.
>
> DM Smith wrote:
>> It makes sense if you understand the context. We make each verse of a
>> Bible a document. There are about 36000 docc in a Bible. We want a
>> user to find all the verses that match there search to give the count
>> of total hits. We then show slices of the hits from first hit to last
>> im document order typically about 100 at a time. Scoring is  
>> unimportant.
>>
>> The user can also choose to prioritize and limit the results. This
>> uses scoring and the top docs. This is not the users prefered search.
>>
>> So I don't mind being nasty. But having looked at it I think it would
>> be better to have a non-scoring collector that is a co-process that
>> w/an iterator interface gets the next doc on demand, from first doc  
>> in
>> index to last.
>>
>> -- DM 
>>
>>
>> On Oct 3, 2009, at 6:12 PM, Mark Miller <markrmiller@gmail.com>  
>> wrote:
>>
>>> You used Hits to get all that hits? Nasty man - thats we  
>>> deprecated that
>>> class - even though the JavaDoc warns you thats a major speed trap,
>>> everyone still did it ... use a Collector.
>>>
>>> Your right though - it shouldn't point to IndexSearcher.search 
>>> (Query)
>>> after that - it should point to IndexSearcher.search(Query, int)
>>>
>>> Goto fix that.
>>>
>>> DM Smith wrote:
>>>> I'm working on migrating my code to 2.9. And I'm trying to figure  
>>>> out
>>>> what to do. Along the way I found a circular argument in the  
>>>> JavaDoc
>>>> for Searcher. BTW, this is not a user question.
>>>>
>>>> My current code calls:
>>>>               Hits hits = searcher.search(query);
>>>>
>>>> The JavaDoc for it says:
>>>> /** Returns the documents matching <code>query</code>.
>>>>  * @throws BooleanQuery.TooManyClauses
>>>>  * @deprecated Hits will be removed in Lucene 3.0. Use
>>>>  * {@link #search(Query, Filter, int)} instead.
>>>>  */
>>>> public final Hits search(Query query) throws IOException {
>>>>   return search(query, (Filter)null);
>>>> }
>>>>
>>>> However, search(Query, Filter, int) is not quite appropriate as I  
>>>> need
>>>> all hits. I guess I could pass null for filter and MAX_INT.
>>>>
>>>> So, I found search(Query, Collector), which seems most appropriate.
>>>> (Not sure though, but I'll figure it out.) However, the JavaDoc  
>>>> for it
>>>> says:
>>>> /** Lower-level search API.
>>>> *
>>>> * <p>{@link Collector#collect(int)} is called for every matching
>>>> document.
>>>> *
>>>> * <p>Applications should only use this if they need <i>all</i>
of  
>>>> the
>>>> * matching documents.  The high-level search API ({@link
>>>> * Searcher#search(Query)}) is usually more efficient, as it skips
>>>> * non-high-scoring hits.
>>>> * <p>Note: The <code>score</code> passed to this method
is a raw
>>>> score.
>>>> * In other words, the score will not necessarily be a float whose
>>>> value is
>>>> * between 0 and 1.
>>>> * @throws BooleanQuery.TooManyClauses
>>>> */
>>>> public void search(Query query, Collector results)
>>>>  throws IOException {
>>>>  search(createWeight(query), null, results);
>>>> }
>>>>
>>>> But Searcher.search(Query) is deprecated.
>>>>
>>>> So what is the appropriate documentation for getting all "hits"?  
>>>> Seems
>>>> to say, "Don't do that"
>>>>
>>>> -- DM
>>>>
>>>>
>>>
>>>
>>> -- 
>>> - Mark
>>>
>>> http://www.lucidimagination.com
>>>
>>>
>>>
>>>
>>> --- 
>>> ------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> -- 
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message