lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haroldo Nascimento" <haroldo.ara...@gmail.com>
Subject Re: Time of processing hits.doc()
Date Mon, 19 Nov 2007 16:05:55 GMT
German,

  What I need is similar to the your site
http://listados.deremate.com.ar/panaderia .
  I have many results of search, but I show any result (for example:
first 10 for first page) , but for create the options of filter of
location I need read all results fof search. The problem of
performance is when I have 30.000 results.

  How you get the filter the "ubication" ? You need read all results ?

  thanks

On Nov 19, 2007 12:05 PM, Grant Ingersoll <gsingers@apache.org> wrote:
> I think, based on your previous question, that you just need to use
> the search() method that returns TopDocs, not the lower-level
> HitCollector method.  From the TopDocs, you can then access the
> ScoreDoc, which will give you info about the doc and the score.  See http://www.lucenebootcamp.com/LuceneBootCamp/training/src/test/java/com/lucenebootcamp/training/basic/TopDocsTest.java
>  from my Lucene Boot Camp training class for a really simple example.
>
> -Grant
>
>
> On Nov 19, 2007, at 9:12 AM, Haroldo Nascimento wrote:
>
> > Mark,
> >
> >  How I can get the information of Document. I think that is in the
> > implementation do method abstract collect. How I can get it .
> >
> >  Below is the example of javadoc the Lucene.
> >
> > Searcher searcher = new IndexSearcher(indexReader);
> >   final BitSet bits = new BitSet(indexReader.maxDoc());
> >   searcher.search(query, new HitCollector() {
> >       public void collect(int doc, float score) {
> >         bits.set(doc);
> >       }
> >     });
> >
> > Thanks
> >
> >
> > On Nov 18, 2007 8:09 PM, Mark Miller <markrmiller@gmail.com> wrote:
> >> Hey Haroldo.
> >>
> >> First thing you need to do is *stop* using Hits in your searches.
> >> Hits
> >> is optimized for some pretty specific use cases and you will get
> >> along
> >> much better by using a HitCollector.
> >>
> >> Hits has three main functions:
> >>
> >> It caches documents, normalizes scores, and stores ids associated
> >> with
> >> scores (a HitDoc). If you attempt to retrieve a HitDoc past the first
> >> 100 from Hits, a new search will be issued to grab double the
> >> required
> >> HitDocs needed to satisfy your HitDoc retrieval attempt. This will be
> >> repeated everytime you ask for a HitDoc beyond the current cache
> >> (which
> >> began at 100). This means that if you need to get a HitDoc beyond
> >> 100,
> >> Hits is not a great choice for you. You will want to use the
> >> HitCollector instead...but remember that you are losing the
> >> normalized
> >> scores (simple to copy code if you still want it) and the document
> >> caching (I rarely want that anyway).
> >>
> >> An issue to watch out for: with Hits, you do not have to ask for how
> >> many docs to get back, but with a HitCollector solution you will need
> >> to. This is a minor dilema if you want to go over all of the hits no
> >> matter what. You can pass a huge number to ensure you get everything,
> >> but you will be creating large data structures if you do this, as
> >> structure sizes may be initialized by the number you pass. Also,
> >> passing
> >> the maximum integer will cause an error (negative init size) as
> >> Lucene
> >> initializes a data structure to hold the hits as n+1.
> >>
> >> - Mark
> >>
> >>
> >> Haroldo Nascimento wrote:
> >>> I have a problem of performance when I need group the result do
> >>> search
> >>>
> >>> I have the code below:
> >>>
> >>>   for (int i = 0; i < hits.length(); i++) {
> >>>                    doc = hits.doc(i);
> >>>
> >>>                    obj1 = doc.get(Constants.STATE_DESC_FIELD_LABEL);
> >>>                    obj2 = doc.get(xxx);
> >>>                    ...
> >>>   }
> >>>
> >>>  I work with volume of data very big. The search process in 0.300
> >>> seconds but when the object hits have much results, the time for get
> >>> all objects is very big. The command hits.doc(i) is processed in 2
> >>> second.
> >>>
> >>>  Por exemplo. For hits.length() equals the 25.000 results, the time
> >>> of "pos search" is 7 seconds.
> >>>
> >>>  I get all result because I need group the result (remove the
> >>> duplicate results).
> >>>
> >>>  Is there any form in Lucene that group the result. I need of
> >>> anything as the command "group by" of sql.
> >>>
> >>>  Thanks.
> >>>
> >>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>>
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
> --------------------------
> Grant Ingersoll
> http://lucene.grantingersoll.com
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message