lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: best way to iterate through all docs from a query
Date Fri, 20 Nov 2009 16:49:08 GMT
The doc IDs should be consistent *unless* you did anything to the
index, things you might not think would change anything. For instance,
any kind of commit (assuming you'd ever deleted a document, say). etc.

So if you haven't changed your index at all, your doc IDs won't change.
But as I said, some operations you don't think would change your doc IDs
actually will....

HTH
Erick

On Fri, Nov 20, 2009 at 10:17 AM, it99 <deswiatlowski@syrres.com> wrote:

>
> Thanks that helped a lot with the speed!!
> I am getting same search results but with different docIds. Is this
> expected
> and OK? Are they just arbitrar numbers
>
> If  I changed from
>            Hits hits = mSearcher.search(query, filter);
>
>
> To the following
>             TopDocCollector collector = new TopDocCollector(1000000);
>             mSearcher.search(query, filter,collector);
>            hits = collector.topDocs().scoreDocs;
>
>
>
>
>
>
>
> Ian Lea wrote:
> >
> > First queries are often slow and subsequent ones faster.  Search the
> > list for warming - I think there was something on it in the last
> > couple of days.  Or read the "When measuring performance, disregard
> > the first query" bit of
> > http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
> >
> > A good number to pass to the Collector is however many docs you are
> > going to be interested in.  If you are just going to display the first
> > 10, pass 10.
> >
> >
> > --
> > Ian.
> >
> >
> > On Thu, Nov 19, 2009 at 3:36 PM, it99 <deswiatlowski@syrres.com> wrote:
> >>
> >> What is the best way to iterate across all the documents in a search
> >> results?
> >> Previously I was using the deprecated Hits object but changed the
> >> implentations as recommended in javadocs to ScoreDoc.
> >>
> >> I've tried the following but I've seen warning about peformance.
> >> Seems the first time I query something it takes long time and then after
> >> that it is quick.
> >>
> >>
> >>
> >>                for (int i = 0; i < mNumberOfHits; i++)
> >>                {
> >>
> >>                    int docId = hits[i].doc;
> >>                    Document doc = searcher.doc(docId);
> >>                }
> >>
> >> Here's the code for the search
> >> What is good number to pass intot TopDocCollector?
> >>
> >>            TopDocCollector collector = new TopDocCollector(1000000);
> >>            searcher.search(query, collector);
> >>            ScoreDoc[] hits = collector.topDocs().scoreDocs;
> >> --
> >> View this message in context:
> >>
> http://old.nabble.com/best-way-to-iterate-through-all-docs-from-a-query-tp26421373p26421373.html
> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/best-way-to-iterate-through-all-docs-from-a-query-tp26421373p26442733.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message