lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Read large size index
Date Tue, 30 Jun 2009 12:38:44 GMT
The correct way to iterate over all results is to use a custom HitCollector
(Collector in 2.9) instance. The HitCollector's method collect(docid, score)
is called for every hit. No need to allocate arrays then:

e.g.:
searcher.search(query, new HitCollector() {
	@Override public void collect(int docid, float score) {
		// do something with docid
	}
});

TopDocsCollector is used to get a relevance-sorted view on the top ranking
hits. It is not for iterating over the whole results (in full text search,
nobody would normally do this. E.g. Google does not allow you to go beyond
page 100). If you want to display the top 10 results you can use
TopDocCollector(10).

Uwe


-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: m.harig [mailto:m.harig@gmail.com]
> Sent: Tuesday, June 30, 2009 2:31 PM
> To: java-user@lucene.apache.org
> Subject: Re: Read large size index
> 
> 
> 
> 
> Hi there,
> 
> On Tue, Jun 30, 2009 at 12:41 PM, m.harig<m.harig@gmail.com> wrote:
> >
> > Thanks Simon ,
> >
> >          Its working now , thanks a lot , i've a doubt
> >
> >       i've got 30,000 pdf files indexed ,  but if i use the code which
> you
> > sent , returns only 200 results , because am setting   TopDocs topDocs =
> > searcher.search(query,200);  as i said if use Integer.MAX_VALUE , it
> > returns
> > java heap space error , even i can't use 300 ,
> The Integer.MAX_VALUE was my fault. Internally lucene allocates an
> array of the size n (searcher.search(query,n)) even if your query only
> returns 1 document. This causes the OOM. Only get as many results as
> you need!
> 
> In turn is iterating and loading of all those documents necessary?
> 
> no need to iterate all documents , i set searcher.search(query,10000) , am
> getting the results ,
> 
> What is your usecase of lucene where you have to load 30k of
> documents? You have to be aware of that if you load 30k docs you need
> enough memory for them in you JVM. I have no idea how you index and
> what you store in the index but 30k pdf with -Xmx128M is not much :)
> 
> is there any way to get the total hits from the index when i search for a
> keyword? i mean i set TopDocCollector collector = new
> TopDocCollector(10000); , so the results will not exceed more than 10k ,
> what am asking is i need to display the total hits from the index , it
> might
> be more than 10k , like google did ,  Results 1 - 10 of about 51,200 , can
> you please tell me..
> 
> simon
> 
> --
> View this message in context: http://www.nabble.com/Read-large-size-index-
> tp24251993p24271025.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message