lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: best way to iterate through all docs from a query
Date Sat, 21 Nov 2009 00:16:28 GMT
Unless I'm way off base, that's what the ScoreDocs[] array is all about....

Best
Erick

On Fri, Nov 20, 2009 at 3:21 PM, it99 <deswiatlowski@syrres.com> wrote:

>
> Thanks for the info!
> I was comparing from the Hits the nth number in the set number to
> the documentId so that's why they are different. Is there anyway to get the
> 'nth number in set' if you have the docId without using the Hits object? Or
> is that a Hits only thing?
>
>
>
> Erick Erickson wrote:
> >
> > The doc IDs should be consistent *unless* you did anything to the
> > index, things you might not think would change anything. For instance,
> > any kind of commit (assuming you'd ever deleted a document, say). etc.
> >
> > So if you haven't changed your index at all, your doc IDs won't change.
> > But as I said, some operations you don't think would change your doc IDs
> > actually will....
> >
> > HTH
> > Erick
> >
> > On Fri, Nov 20, 2009 at 10:17 AM, it99 <deswiatlowski@syrres.com> wrote:
> >
> >>
> >> Thanks that helped a lot with the speed!!
> >> I am getting same search results but with different docIds. Is this
> >> expected
> >> and OK? Are they just arbitrar numbers
> >>
> >> If  I changed from
> >>            Hits hits = mSearcher.search(query, filter);
> >>
> >>
> >> To the following
> >>             TopDocCollector collector = new TopDocCollector(1000000);
> >>             mSearcher.search(query, filter,collector);
> >>            hits = collector.topDocs().scoreDocs;
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> Ian Lea wrote:
> >> >
> >> > First queries are often slow and subsequent ones faster.  Search the
> >> > list for warming - I think there was something on it in the last
> >> > couple of days.  Or read the "When measuring performance, disregard
> >> > the first query" bit of
> >> > http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
> >> >
> >> > A good number to pass to the Collector is however many docs you are
> >> > going to be interested in.  If you are just going to display the first
> >> > 10, pass 10.
> >> >
> >> >
> >> > --
> >> > Ian.
> >> >
> >> >
> >> > On Thu, Nov 19, 2009 at 3:36 PM, it99 <deswiatlowski@syrres.com>
> wrote:
> >> >>
> >> >> What is the best way to iterate across all the documents in a search
> >> >> results?
> >> >> Previously I was using the deprecated Hits object but changed the
> >> >> implentations as recommended in javadocs to ScoreDoc.
> >> >>
> >> >> I've tried the following but I've seen warning about peformance.
> >> >> Seems the first time I query something it takes long time and then
> >> after
> >> >> that it is quick.
> >> >>
> >> >>
> >> >>
> >> >>                for (int i = 0; i < mNumberOfHits; i++)
> >> >>                {
> >> >>
> >> >>                    int docId = hits[i].doc;
> >> >>                    Document doc = searcher.doc(docId);
> >> >>                }
> >> >>
> >> >> Here's the code for the search
> >> >> What is good number to pass intot TopDocCollector?
> >> >>
> >> >>            TopDocCollector collector = new TopDocCollector(1000000);
> >> >>            searcher.search(query, collector);
> >> >>            ScoreDoc[] hits = collector.topDocs().scoreDocs;
> >> >> --
> >> >> View this message in context:
> >> >>
> >>
> http://old.nabble.com/best-way-to-iterate-through-all-docs-from-a-query-tp26421373p26421373.html
> >> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >> >>
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >> >>
> >> >>
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >> >
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://old.nabble.com/best-way-to-iterate-through-all-docs-from-a-query-tp26421373p26442733.html
> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/best-way-to-iterate-through-all-docs-from-a-query-tp26421373p26447343.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message