lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aslam bari <>
Subject Re: Big size xml file indexing
Date Mon, 22 Jan 2007 04:50:09 GMT
Hi Saikrishna,
Thanks for reply,
But i don't know how i can go with this. Here is my code sample, let me know where to change.

SAXBuilder builder = new SAXBuilder();

//CONTENT here is bytearrayinputstream , i know i can give here file url also. Let me know
whta is best.
Document doc =;


----- Original Message ----
From: saikrishna venkata pendyala <>
Sent: Monday, 22 January, 2007 10:07:27 AM
Subject: Re: Big size xml file indexing

Hai ,
       I have indexed 6.2 gb xml file using lucene. What I did was
        1 .  I have splitted the 6.2gb file into small files each of size
        2 .  And then I worte a python script to quantize number
no.ofdocuments in each file.

        Structure of my xml file is """
        </document> """

Since you cannot go beyond 500MB this technique might help you of course if
file sturcture is the same.

On 1/22/07, aslam bari <> wrote:
> Dear all,
> I m using lucene to index xml files. For parsing i m using JDOM to get
> XPATH nodes and do some manipulation on them and indexed them. All things
> work well but when the file size is very big about 35 - 50 MB. Then it goes
> out of memory or take a lot of time. How can i set some parameters to speed
> up and took less memory to parse the file. The problem is that i cannot
> increase much high Heap Size. So i have to limit to use heap size of 300 -
> 500 MB. Has anybody some solution for this.
> Thanks...
> __________________________________________________________
> Yahoo! India Answers: Share what you know. Learn something new

Yahoo! India Answers: Share what you know. Learn something new
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message