lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haroldo Nascimento" <haroldo.ara...@gmail.com>
Subject Re: Time of processing hits.doc()
Date Mon, 19 Nov 2007 14:12:34 GMT
Mark,

  How I can get the information of Document. I think that is in the
implementation do method abstract collect. How I can get it .

  Below is the example of javadoc the Lucene.

Searcher searcher = new IndexSearcher(indexReader);
   final BitSet bits = new BitSet(indexReader.maxDoc());
   searcher.search(query, new HitCollector() {
       public void collect(int doc, float score) {
         bits.set(doc);
       }
     });

 Thanks


On Nov 18, 2007 8:09 PM, Mark Miller <markrmiller@gmail.com> wrote:
> Hey Haroldo.
>
> First thing you need to do is *stop* using Hits in your searches. Hits
> is optimized for some pretty specific use cases and you will get along
> much better by using a HitCollector.
>
> Hits has three main functions:
>
> It caches documents, normalizes scores, and stores ids associated with
> scores (a HitDoc). If you attempt to retrieve a HitDoc past the first
> 100 from Hits, a new search will be issued to grab double the required
> HitDocs needed to satisfy your HitDoc retrieval attempt. This will be
> repeated everytime you ask for a HitDoc beyond the current cache (which
> began at 100). This means that if you need to get a HitDoc beyond 100,
> Hits is not a great choice for you. You will want to use the
> HitCollector instead...but remember that you are losing the normalized
> scores (simple to copy code if you still want it) and the document
> caching (I rarely want that anyway).
>
> An issue to watch out for: with Hits, you do not have to ask for how
> many docs to get back, but with a HitCollector solution you will need
> to. This is a minor dilema if you want to go over all of the hits no
> matter what. You can pass a huge number to ensure you get everything,
> but you will be creating large data structures if you do this, as
> structure sizes may be initialized by the number you pass. Also, passing
> the maximum integer will cause an error (negative init size) as Lucene
> initializes a data structure to hold the hits as n+1.
>
> - Mark
>
>
> Haroldo Nascimento wrote:
> > I have a problem of performance when I need group the result do search
> >
> > I have the code below:
> >
> >    for (int i = 0; i < hits.length(); i++) {
> >                     doc = hits.doc(i);
> >
> >                     obj1 = doc.get(Constants.STATE_DESC_FIELD_LABEL);
> >                     obj2 = doc.get(xxx);
> >                     ...
> >    }
> >
> >   I work with volume of data very big. The search process in 0.300
> > seconds but when the object hits have much results, the time for get
> > all objects is very big. The command hits.doc(i) is processed in 2
> > second.
> >
> >   Por exemplo. For hits.length() equals the 25.000 results, the time
> > of "pos search" is 7 seconds.
> >
> >   I get all result because I need group the result (remove the
> > duplicate results).
> >
> >   Is there any form in Lucene that group the result. I need of
> > anything as the command "group by" of sql.
> >
> >   Thanks.
> >
>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message