lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From it99 <deswiatlow...@syrres.com>
Subject Re: best way to iterate through all docs from a query
Date Fri, 20 Nov 2009 20:21:53 GMT

Thanks for the info! 
I was comparing from the Hits the nth number in the set number to
the documentId so that's why they are different. Is there anyway to get the
'nth number in set' if you have the docId without using the Hits object? Or
is that a Hits only thing?



Erick Erickson wrote:
> 
> The doc IDs should be consistent *unless* you did anything to the
> index, things you might not think would change anything. For instance,
> any kind of commit (assuming you'd ever deleted a document, say). etc.
> 
> So if you haven't changed your index at all, your doc IDs won't change.
> But as I said, some operations you don't think would change your doc IDs
> actually will....
> 
> HTH
> Erick
> 
> On Fri, Nov 20, 2009 at 10:17 AM, it99 <deswiatlowski@syrres.com> wrote:
> 
>>
>> Thanks that helped a lot with the speed!!
>> I am getting same search results but with different docIds. Is this
>> expected
>> and OK? Are they just arbitrar numbers
>>
>> If  I changed from
>>            Hits hits = mSearcher.search(query, filter);
>>
>>
>> To the following
>>             TopDocCollector collector = new TopDocCollector(1000000);
>>             mSearcher.search(query, filter,collector);
>>            hits = collector.topDocs().scoreDocs;
>>
>>
>>
>>
>>
>>
>>
>> Ian Lea wrote:
>> >
>> > First queries are often slow and subsequent ones faster.  Search the
>> > list for warming - I think there was something on it in the last
>> > couple of days.  Or read the "When measuring performance, disregard
>> > the first query" bit of
>> > http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
>> >
>> > A good number to pass to the Collector is however many docs you are
>> > going to be interested in.  If you are just going to display the first
>> > 10, pass 10.
>> >
>> >
>> > --
>> > Ian.
>> >
>> >
>> > On Thu, Nov 19, 2009 at 3:36 PM, it99 <deswiatlowski@syrres.com> wrote:
>> >>
>> >> What is the best way to iterate across all the documents in a search
>> >> results?
>> >> Previously I was using the deprecated Hits object but changed the
>> >> implentations as recommended in javadocs to ScoreDoc.
>> >>
>> >> I've tried the following but I've seen warning about peformance.
>> >> Seems the first time I query something it takes long time and then
>> after
>> >> that it is quick.
>> >>
>> >>
>> >>
>> >>                for (int i = 0; i < mNumberOfHits; i++)
>> >>                {
>> >>
>> >>                    int docId = hits[i].doc;
>> >>                    Document doc = searcher.doc(docId);
>> >>                }
>> >>
>> >> Here's the code for the search
>> >> What is good number to pass intot TopDocCollector?
>> >>
>> >>            TopDocCollector collector = new TopDocCollector(1000000);
>> >>            searcher.search(query, collector);
>> >>            ScoreDoc[] hits = collector.topDocs().scoreDocs;
>> >> --
>> >> View this message in context:
>> >>
>> http://old.nabble.com/best-way-to-iterate-through-all-docs-from-a-query-tp26421373p26421373.html
>> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>
>> >>
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>> >
>> >
>>
>> --
>> View this message in context:
>> http://old.nabble.com/best-way-to-iterate-through-all-docs-from-a-query-tp26421373p26442733.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> 

-- 
View this message in context: http://old.nabble.com/best-way-to-iterate-through-all-docs-from-a-query-tp26421373p26447343.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message