lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Wechner <michael.wech...@wyona.com>
Subject Re: Parsing large xml files
Date Fri, 22 May 2009 08:48:21 GMT
crackeur@comcast.net schrieb:
> http://vtd-xml.sf.net 
>
>
> ----- Original Message ----- 
> From: "Sithu D. Sudarsan" <Sithu.Sudarsan@fda.hhs.gov> 
> To: java-user@lucene.apache.org 
> Sent: Thursday, May 21, 2009 7:42:59 AM GMT -08:00 US/Canada Pacific 
> Subject: Parsing large xml files 
>
>
> Hi, 
>
> While trying to parse xml documents of about 50MB size, we run into 
> OutOfMemoryError due to java heap space. Increasing JVM to use close 2GB 
> (that is the max), does not help. Is there any API that could be used to 
> handle such large single xml files? 
>   

I am not familiar with that particular code of Lucene, but is it 
possible that Lucene is using DOM for this parsing?
If so, one could try to replace it by SAX, and hence get rid of the 
OutOfMemory issue.

Cheers

Michael
> If Lucene is not the right place, please let me know alternate places to 
> look for, 
>
> Thanks in advance, 
> Sithu D Sudarsan 
> sithu.sudarsan@fda.hhs.gov 
> sdsudarsan@ualr.edu 
>
>
>
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message