lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "MariLuz Elola" <mel...@seinet.es>
Subject Re: OUTOFMEMORY ERROR
Date Thu, 07 Jul 2005 17:12:27 GMT
Hi Erik, excuse me for all my questions. Thank you very much for your speedy 
answers, and sorry for my bad english.
I am spanish and I don´t speak english very well.
Well, I have one question more.
Finally I am using IndexReader to return all the documents:
                Directory directory = FSDirectory.getDirectory(path, false);
                IndexReader reader = IndexReader.open(directory);
        for (int start = base; start < end; start++) {
            Document doc = reader.document(start);
            String 
id=doc.get(es.seinet.xtent.searchEngine.lucene.general.Util.ID);
            ides.add(id);
        }
It works fine and speedy. The only problem is that it is impossible to sort 
the results by some metadata (gets all the documents order by title, for 
example).

My question is about the parameter maxClauseCount. I think the same that 
you. It is not a good idea bump up the limit...
If I use the default vale (1024) and I search, I am getting this error:
[SearchCollection,executeQuery] caught a class 
org.apache.lucene.search.BooleanQuery$TooManyClauses
 with message: null

Are there any way to search all the documents (210.000 documents) and 
internally works only with 1024, returns documents until 1024 and not get 
the toomanyclauses error??? I need to work efficiently with collections of 
more than 250.000 regitries, and the users normally does complex querys (ej: 
DATE:[20050601 to 20050701] AND TITLE:Lucene*  ...... ect....)

Ah!! I have seen that you are Erik Hatcher, the author of Lucene In 
Action!!!
I don´t understand you about the filter.... well, I will read the charter of 
filtering a search :-D

Thanks in advance

        Mari Luz

----- Original Message ----- 
From: "Erik Hatcher" <erik@ehatchersolutions.com>
To: <general@lucene.apache.org>
Sent: Thursday, July 07, 2005 5:53 PM
Subject: Re: OUTOFMEMORY ERROR



On Jul 7, 2005, at 9:40 AM, MariLuz Elola wrote:
> Thanks Erik,
> I was wrong, exactly the query that throws an OutOfMemory error is  ==> 
> ID:0* -ID:xtent.
> With the query ID:0* I have tried to reproduce the error, but the 
> exception doen´t appear.

> Other thing, when the user searchs without using any query,  internally I 
> am creating the next query ==> ID:0* OR NOT ID:xtent.

That's a hairy query.  I definitely do not recommend doing something
like that with prefix queries.  Check out using a Filter for some of
this sort of thing also.

> And this query parsed by QueryParser I am obtaining ID:0* -ID:xtent 
> (traslated ==> ID:0* AND NOT ID:xtent), isn´t? Is QueryParser  working 
> wrong???

It depends.  By default, QueryParser uses OR as the default operator.

> About maxClauseCount (by default 1024), I am setting this property:
> org.apache.lucene.search.BooleanQuery.maxClauseCount=es.seinet.xtent.s 
> earchEngine.lucene.general.Util.MAX_LUCENE_DOCUMENTS;

Bumping up that limit is not necessarily the best thing to do - I
recommend changing your approach to querying all documents rather
than trying to make BooleanQuery happy with an enormously inefficient
query.

     Erik


>
>    Mari Luz
>
> ----- Original Message ----- From: "Erik Hatcher" 
> <erik@ehatchersolutions.com>
> To: <general@lucene.apache.org>
> Sent: Thursday, July 07, 2005 2:46 PM
> Subject: Re: OUTOFMEMORY ERROR
>
>
>
> On Jul 7, 2005, at 6:02 AM, MariLuz Elola wrote:
>
>> The query is ==> ID:0*
>> This query returns all the documents, exactly 210.000 documents.
>> If the user doesn´t specify any criterio in the user interface of 
>> searching, the server searchs all the documents.
>>
>
> Doing a prefix query (which ID:0* is) internally builds a
> BooleanQuery OR'ing all unique terms in the ID field that begin with
> a "0".  The built in limit is 1,024 clauses in a BooleanQuery.
>
> You will need to re-think your approach.  If the goal is to return
> all documents, then use IndexReader to walk them.  If the goal is to
> have a general user query expression where ID:0* would be entered you
> will need to account for that possibility with more system resources
> and bumping up the BooleanQuery limit or indexing differently so that
> there are no so many terms being put into the BooleanQuery.  It is
> difficult to offer specific advice as I'm not sure what your use
> cases are.
>
>     Erik
>
>
>
>
>>
>>    Mari Luz
>>
>>
>>
>> Untitled Document   ---------------------------------------------------  
>> Mari Luz  Elola  Developer Engineer Caleruega, 67 28033 Madrid (Spain) 
>> Tel.:  +34 91  768 46 58 mailto: 
>> ola@seinet.es   ---------------------------------------------------  
>> Privileged/  Confidential Information may be contained in this message 
>> and is   intended solely for the use of the named addressee(s). Access to 
>> this e-mail by anyone else is unauthorised. If you are not the   intended 
>> recipient, any disclosure, copying, distribution or re- use  of the 
>> information contained in it is prohibited and may be   unlawful. 
>> Opinions, conclusions and any other information  contained  in this 
>> message that do not relate to the official  business of  Seinet shall be 
>> understood as neither given nor  endorsed by it. If  you have received 
>> this communication in error,  please notify us  immediately by replying 
>> to this mail and  deleting it from your  computer. Thank you.
>> ----- Original Message ----- From: "Erik Hatcher" 
>> <erik@ehatchersolutions.com>
>> To: <general@lucene.apache.org>
>> Sent: Wednesday, July 06, 2005 8:12 PM
>> Subject: Re: OUTOFMEMORY ERROR
>>
>>
>> We'll need some more details to help.  What query was it?
>>
>>     Erik
>>
>> On Jul 6, 2005, at 1:22 PM, MariLuz Elola wrote:
>>
>>
>>
>>> Hi, I have a problem when I am trying to search a simple query 
>>> without sorting into an index with 210.000 documents.
>>> Executing the query several times I am getting the OutOfMemory  error.
>>> I am creating an IndexSearcher(pathDir) every search.
>>> I don´t know if it will be necessary to create only one    indexSearcher 
>>> and caching it,
>>> If I search into an index with only 50.000 documents, the    outofMemory 
>>> error doen´t appear.
>>> ------------------------
>>> ENVIROMENT DESCRIPTION:
>>> ------------------------
>>>
>>> ---SERVER---
>>> MEMORY 2GB
>>> APP SERVER Jboss3.2.3
>>> JAVA_OPTS -Xmx640M -Xms640M
>>>
>>> ----LUCENE 1.4.3-------
>>> INDEX +- 210.000 documents
>>> EACH DOCUMENT +- 20 fields (metadatas)
>>> SIZE TEXT DOCUMENT 1k
>>>
>>> ------------------------
>>> ERROR:
>>> ------------------------
>>> 18:52:18,657 ERROR [LogInterceptor] Unexpected Error:
>>> java.lang.OutOfMemoryError
>>> 18:52:18,657 ERROR [LogInterceptor] Unexpected Error:
>>> java.lang.OutOfMemoryError
>>> 18:52:18,660 ERROR [STDERR] java.rmi.ServerError: Unexpected   Error; 
>>> nested exception is:
>>>         java.lang.OutOfMemoryError
>>> 18:52:18,661 ERROR [STDERR]     at 
>>> org.jboss.ejb.plugins.LogInterceptor.handleException 
>>> (LogInterceptor.java:374)
>>> 18:52:18,661 ERROR [STDERR]     at 
>>> org.jboss.ejb.plugins.LogInterceptor.invoke(LogInterceptor.java:195)
>>> 18:52:18,661 ERROR [STDERR]     at 
>>> org.jboss.ejb.plugins.ProxyFactoryFinderInterceptor.invoke 
>>> (ProxyFactoryFinderInterceptor.java:122)
>>> 18:52:18,662 ERROR [STDERR]     at 
>>> org.jboss.ejb.StatelessSessionContainer.internalInvoke 
>>> (StatelessSessionContainer.java:331)
>>> 18:52:18,662 ERROR [STDERR]     at org.jboss.ejb.Container.invoke 
>>> (Container.java:700)
>>> 18:52:18,662 ERROR [STDERR]     at 
>>> sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
>>> 18:52:18,662 ERROR [STDERR]     at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invok
>>> .
>>> .
>>> Exception java.lang.OutOfMemoryError: requested 4 bytes for  CMS:   Work 
>>> queue overflow; try -XX:-CMSParallelRemarkEnabled.  Out of  swap  space?
>>>
>>>
>>> Could anybody help me???
>>>
>>> Thanks in advance
>>>
>>>     Mari Luz
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>
>



Mime
View raw message