lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harini Raghavan" <harini.ragha...@insideview.com>
Subject Re: OutOfMemory errors while indexing large documents
Date Mon, 25 Jul 2005 14:22:35 GMT
I am using org.htmlparser.parserapplications.StringExtractor to parse the 
html pages,  I guess the OutOfMemory occurs while parsing the large HTML 
pages and not while indexing. Sorry about the confusion.

----- Original Message ----- 
From: "Erik Hatcher" <erik@ehatchersolutions.com>
To: <java-user@lucene.apache.org>
Sent: Monday, July 25, 2005 6:43 PM
Subject: Re: OutOfMemory errors while indexing large documents


> Could you be more specific about where the OutOfMemory error is 
> happening?  Do you have a complete stack trace?
>
> As for maxFieldLength - in my use of Lucene, it is necessary to index  the 
> entire document and not just the first 10,000 or so terms - I set 
> maxFieldLength to Integer.MAX_VALUE.
>
>     Erik
>
>
> On Jul 25, 2005, at 7:30 AM, Harini Raghavan wrote:
>
>> Hi All,
>> I am using lucene to index large documents(HTML pages). The  application 
>> is running on JBoss and MySQL on UNIX. The indexing is  throwing 
>> OutOfMemory errors beyond a certain point. I am not sure  why this is 
>> happening. I am using the default IndexWriter  properties, but the lucene 
>> documentation mentions about setting the  max field length on the 
>> IndexWriter to some optimum value for large  documents. Is anyone aware 
>> of any optimum settings for  maxFieldLength, mergeFactor, minMergeDoc and 
>> maxMergeDoc?
>> Thanks,
>> Harini
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message