lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Time of processing hits.doc()
Date Mon, 19 Nov 2007 15:05:48 GMT
I think, based on your previous question, that you just need to use  
the search() method that returns TopDocs, not the lower-level  
HitCollector method.  From the TopDocs, you can then access the  
ScoreDoc, which will give you info about the doc and the score.  See http://www.lucenebootcamp.com/LuceneBootCamp/training/src/test/java/com/lucenebootcamp/training/basic/TopDocsTest.java

  from my Lucene Boot Camp training class for a really simple example.

-Grant

On Nov 19, 2007, at 9:12 AM, Haroldo Nascimento wrote:

> Mark,
>
>  How I can get the information of Document. I think that is in the
> implementation do method abstract collect. How I can get it .
>
>  Below is the example of javadoc the Lucene.
>
> Searcher searcher = new IndexSearcher(indexReader);
>   final BitSet bits = new BitSet(indexReader.maxDoc());
>   searcher.search(query, new HitCollector() {
>       public void collect(int doc, float score) {
>         bits.set(doc);
>       }
>     });
>
> Thanks
>
>
> On Nov 18, 2007 8:09 PM, Mark Miller <markrmiller@gmail.com> wrote:
>> Hey Haroldo.
>>
>> First thing you need to do is *stop* using Hits in your searches.  
>> Hits
>> is optimized for some pretty specific use cases and you will get  
>> along
>> much better by using a HitCollector.
>>
>> Hits has three main functions:
>>
>> It caches documents, normalizes scores, and stores ids associated  
>> with
>> scores (a HitDoc). If you attempt to retrieve a HitDoc past the first
>> 100 from Hits, a new search will be issued to grab double the  
>> required
>> HitDocs needed to satisfy your HitDoc retrieval attempt. This will be
>> repeated everytime you ask for a HitDoc beyond the current cache  
>> (which
>> began at 100). This means that if you need to get a HitDoc beyond  
>> 100,
>> Hits is not a great choice for you. You will want to use the
>> HitCollector instead...but remember that you are losing the  
>> normalized
>> scores (simple to copy code if you still want it) and the document
>> caching (I rarely want that anyway).
>>
>> An issue to watch out for: with Hits, you do not have to ask for how
>> many docs to get back, but with a HitCollector solution you will need
>> to. This is a minor dilema if you want to go over all of the hits no
>> matter what. You can pass a huge number to ensure you get everything,
>> but you will be creating large data structures if you do this, as
>> structure sizes may be initialized by the number you pass. Also,  
>> passing
>> the maximum integer will cause an error (negative init size) as  
>> Lucene
>> initializes a data structure to hold the hits as n+1.
>>
>> - Mark
>>
>>
>> Haroldo Nascimento wrote:
>>> I have a problem of performance when I need group the result do  
>>> search
>>>
>>> I have the code below:
>>>
>>>   for (int i = 0; i < hits.length(); i++) {
>>>                    doc = hits.doc(i);
>>>
>>>                    obj1 = doc.get(Constants.STATE_DESC_FIELD_LABEL);
>>>                    obj2 = doc.get(xxx);
>>>                    ...
>>>   }
>>>
>>>  I work with volume of data very big. The search process in 0.300
>>> seconds but when the object hits have much results, the time for get
>>> all objects is very big. The command hits.doc(i) is processed in 2
>>> second.
>>>
>>>  Por exemplo. For hits.length() equals the 25.000 results, the time
>>> of "pos search" is 7 seconds.
>>>
>>>  I get all result because I need group the result (remove the
>>> duplicate results).
>>>
>>>  Is there any form in Lucene that group the result. I need of
>>> anything as the command "group by" of sql.
>>>
>>>  Thanks.
>>>
>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message