lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: OUTOFMEMORY ERROR
Date Thu, 07 Jul 2005 19:50:13 GMT

On Jul 7, 2005, at 1:12 PM, MariLuz Elola wrote:

> Hi Erik, excuse me for all my questions. Thank you very much for  
> your speedy answers, and sorry for my bad english.
> I am spanish and I donĀ“t speak english very well.
> Well, I have one question more.
> Finally I am using IndexReader to return all the documents:
>                Directory directory = FSDirectory.getDirectory(path,  
> false);
>                IndexReader reader = IndexReader.open(directory);
>        for (int start = base; start < end; start++) {
>            Document doc = reader.document(start);
>            String id=doc.get 
> (es.seinet.xtent.searchEngine.lucene.general.Util.ID);
>            ides.add(id);
>        }
> It works fine and speedy. The only problem is that it is impossible  
> to sort the results by some metadata (gets all the documents order  
> by title, for example).

If you truly need to have a Query that can find all documents, then  
add a special field to each document with a fixed value such as  
doc:yes and then do a TermQuery for doc:yes.  You could then leverage  
Lucene's sorting capability.

> My question is about the parameter maxClauseCount. I think the same  
> that you. It is not a good idea bump up the limit...
> If I use the default vale (1024) and I search, I am getting this  
> error:
> [SearchCollection,executeQuery] caught a class  
> org.apache.lucene.search.BooleanQuery$TooManyClauses
> with message: null
>
> Are there any way to search all the documents (210.000 documents)  
> and internally works only with 1024, returns documents until 1024  
> and not get the toomanyclauses error??? I need to work efficiently  
> with collections of more than 250.000 regitries, and the users  
> normally does complex querys (ej: DATE:[20050601 to 20050701] AND  
> TITLE:Lucene*  ...... ect....)

The issue is that PrefixQuery, WildcardQuery, RangeQuery, and  
FuzzyQuery all expand to the terms that match in a BooleanQuery OR  
fashion.  You need to identify what terms those are and address them  
individually.  I can't offer specific advice since I don't know what  
fields you're using and what values they may contain.  But one  
example is with dates.  If you index dates and do it at the  
millisecond granularity but you really only need to query by YEAR  
then there is a great chance one of those query types will expand to  
TooManyClauses.  If, instead, you indexed dates by YYYY when all you  
need is year granularity then you have far fewer terms.  I hope this  
makes sense and helps.

     Erik


Mime
View raw message