lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "MariLuz Elola" <mel...@seinet.es>
Subject Re: OUTOFMEMORY ERROR
Date Thu, 07 Jul 2005 14:16:55 GMT
Erik, I have a problem.
Firstly I have created several IndexWriter.
One of them has 210.000 documents, and in the future will be IndexWriters 
with more than millions of documents.
I need to obtain all the documents.
I am searching using the query ID:0* because this query returns all the 
documents.
Exactly I am getting the metadata ID (hits.doc(start).get(.ID)), I am 
getting all the IDs of all the documents of a specific IndexWriter.
I am getting out of memory doing it.
About maxClauseCount (by default 1024), I am setting this property:
org.apache.lucene.search.BooleanQuery.maxClauseCount=es.seinet.xtent.searchEngine.lucene.general.Util.MAX_LUCENE_DOCUMENTS;
You gave me an idea...to use IndexReader instead of IndexSearcher for 
getting all the documents.
I think that it is not possible to use IndexReader, because I need the ID, 
not the phisical files:

      Directory directory = FSDirectory.getDirectory(path false);
      IndexReader reader = IndexReader.open(directory);
      for (int i = 0; i < reader.maxDoc(); i++) ............

Moreover "directory" has all the documents of all the IndexWriter.


        Mari Luz

----- Original Message ----- 
From: "MariLuz Elola" <melola@seinet.es>
To: <general@lucene.apache.org>
Sent: Thursday, July 07, 2005 3:40 PM
Subject: Re: OUTOFMEMORY ERROR


> Thanks Erik,
> I was wrong, exactly the query that throws an OutOfMemory error is ==> 
> ID:0* -ID:xtent.
> With the query ID:0* I have tried to reproduce the error, but the 
> exception doen´t appear.
> I will use IndexReader instead of IndexSearcher for getting all the 
> documents. It´s a good idea.
> Other thing, when the user searchs without using any query, internally I 
> am creating the next query ==> ID:0* OR NOT ID:xtent. And this query 
> parsed by QueryParser I am obtaining ID:0* -ID:xtent (traslated ==> ID:0* 
> AND NOT ID:xtent), isn´t? Is QueryParser working wrong???
> About maxClauseCount (by default 1024), I am setting this property:
> org.apache.lucene.search.BooleanQuery.maxClauseCount=es.seinet.xtent.searchEngine.lucene.general.Util.MAX_LUCENE_DOCUMENTS;
>
>    Mari Luz
>
> ----- Original Message ----- 
> From: "Erik Hatcher" <erik@ehatchersolutions.com>
> To: <general@lucene.apache.org>
> Sent: Thursday, July 07, 2005 2:46 PM
> Subject: Re: OUTOFMEMORY ERROR
>
>
>
> On Jul 7, 2005, at 6:02 AM, MariLuz Elola wrote:
>> The query is ==> ID:0*
>> This query returns all the documents, exactly 210.000 documents.
>> If the user doesn´t specify any criterio in the user interface of 
>> searching, the server searchs all the documents.
>
> Doing a prefix query (which ID:0* is) internally builds a
> BooleanQuery OR'ing all unique terms in the ID field that begin with
> a "0".  The built in limit is 1,024 clauses in a BooleanQuery.
>
> You will need to re-think your approach.  If the goal is to return
> all documents, then use IndexReader to walk them.  If the goal is to
> have a general user query expression where ID:0* would be entered you
> will need to account for that possibility with more system resources
> and bumping up the BooleanQuery limit or indexing differently so that
> there are no so many terms being put into the BooleanQuery.  It is
> difficult to offer specific advice as I'm not sure what your use
> cases are.
>
>     Erik
>
>
>
>>
>>    Mari Luz
>>
>>
>>
>> Untitled Document  ---------------------------------------------------  
>> Mari Luz Elola  Developer Engineer Caleruega, 67 28033 Madrid (Spain) 
>> Tel.: +34 91  768 46 58 mailto: 
>> lola@seinet.es  ---------------------------------------------------  
>> Privileged/ Confidential Information may be contained in this message and 
>> is  intended solely for the use of the named addressee(s). Access to 
>> this e-mail by anyone else is unauthorised. If you are not the  intended 
>> recipient, any disclosure, copying, distribution or re-use  of the 
>> information contained in it is prohibited and may be  unlawful. Opinions, 
>> conclusions and any other information contained  in this message that do 
>> not relate to the official business of  Seinet shall be understood as 
>> neither given nor endorsed by it. If  you have received this 
>> communication in error, please notify us  immediately by replying to this 
>> mail and deleting it from your  computer. Thank you.
>> ----- Original Message ----- From: "Erik Hatcher" 
>> <erik@ehatchersolutions.com>
>> To: <general@lucene.apache.org>
>> Sent: Wednesday, July 06, 2005 8:12 PM
>> Subject: Re: OUTOFMEMORY ERROR
>>
>>
>> We'll need some more details to help.  What query was it?
>>
>>     Erik
>>
>> On Jul 6, 2005, at 1:22 PM, MariLuz Elola wrote:
>>
>>
>>> Hi, I have a problem when I am trying to search a simple query   without 
>>> sorting into an index with 210.000 documents.
>>> Executing the query several times I am getting the OutOfMemory error.
>>> I am creating an IndexSearcher(pathDir) every search.
>>> I don´t know if it will be necessary to create only one   indexSearcher 
>>> and caching it,
>>> If I search into an index with only 50.000 documents, the   outofMemory 
>>> error doen´t appear.
>>> ------------------------
>>> ENVIROMENT DESCRIPTION:
>>> ------------------------
>>>
>>> ---SERVER---
>>> MEMORY 2GB
>>> APP SERVER Jboss3.2.3
>>> JAVA_OPTS -Xmx640M -Xms640M
>>>
>>> ----LUCENE 1.4.3-------
>>> INDEX +- 210.000 documents
>>> EACH DOCUMENT +- 20 fields (metadatas)
>>> SIZE TEXT DOCUMENT 1k
>>>
>>> ------------------------
>>> ERROR:
>>> ------------------------
>>> 18:52:18,657 ERROR [LogInterceptor] Unexpected Error:
>>> java.lang.OutOfMemoryError
>>> 18:52:18,657 ERROR [LogInterceptor] Unexpected Error:
>>> java.lang.OutOfMemoryError
>>> 18:52:18,660 ERROR [STDERR] java.rmi.ServerError: Unexpected  Error; 
>>> nested exception is:
>>>         java.lang.OutOfMemoryError
>>> 18:52:18,661 ERROR [STDERR]     at 
>>> org.jboss.ejb.plugins.LogInterceptor.handleException 
>>> (LogInterceptor.java:374)
>>> 18:52:18,661 ERROR [STDERR]     at 
>>> org.jboss.ejb.plugins.LogInterceptor.invoke(LogInterceptor.java:195)
>>> 18:52:18,661 ERROR [STDERR]     at 
>>> org.jboss.ejb.plugins.ProxyFactoryFinderInterceptor.invoke 
>>> (ProxyFactoryFinderInterceptor.java:122)
>>> 18:52:18,662 ERROR [STDERR]     at 
>>> org.jboss.ejb.StatelessSessionContainer.internalInvoke 
>>> (StatelessSessionContainer.java:331)
>>> 18:52:18,662 ERROR [STDERR]     at org.jboss.ejb.Container.invoke 
>>> (Container.java:700)
>>> 18:52:18,662 ERROR [STDERR]     at 
>>> sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
>>> 18:52:18,662 ERROR [STDERR]     at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invok
>>> .
>>> .
>>> Exception java.lang.OutOfMemoryError: requested 4 bytes for CMS:   Work 
>>> queue overflow; try -XX:-CMSParallelRemarkEnabled. Out of  swap  space?
>>>
>>>
>>> Could anybody help me???
>>>
>>> Thanks in advance
>>>
>>>     Mari Luz
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>
> 



Mime
View raw message