lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aslam bari <iamasla...@yahoo.co.in>
Subject Re: Big size xml file indexing
Date Mon, 22 Jan 2007 04:50:09 GMT
Hi Saikrishna,
Thanks for reply,
But i don't know how i can go with this. Here is my code sample, let me know where to change.

SAXBuilder builder = new SAXBuilder();

//CONTENT here is bytearrayinputstream , i know i can give here file url also. Let me know
whta is best.
Document doc = builder.build(CONTENT);

loop(---)
{
    doc.selectNodes(xpathquery);
}

Thanks...
----- Original Message ----
From: saikrishna venkata pendyala <pvsaikrishna@gmail.com>
To: java-user@lucene.apache.org
Sent: Monday, 22 January, 2007 10:07:27 AM
Subject: Re: Big size xml file indexing


Hai ,
       I have indexed 6.2 gb xml file using lucene. What I did was
        1 .  I have splitted the 6.2gb file into small files each of size
10mb.
        2 .  And then I worte a python script to quantize number
no.ofdocuments in each file.

        Structure of my xml file is """
       <document>
        -----
        -----
        </document>
        <document>
        -----
        -----
        </document> """

Since you cannot go beyond 500MB this technique might help you of course if
file sturcture is the same.

On 1/22/07, aslam bari <iamaslamok@yahoo.co.in> wrote:
>
> Dear all,
> I m using lucene to index xml files. For parsing i m using JDOM to get
> XPATH nodes and do some manipulation on them and indexed them. All things
> work well but when the file size is very big about 35 - 50 MB. Then it goes
> out of memory or take a lot of time. How can i set some parameters to speed
> up and took less memory to parse the file. The problem is that i cannot
> increase much high Heap Size. So i have to limit to use heap size of 300 -
> 500 MB. Has anybody some solution for this.
>
> Thanks...
>
>
>
> __________________________________________________________
> Yahoo! India Answers: Share what you know. Learn something new
> http://in.answers.yahoo.com/
>


		
__________________________________________________________
Yahoo! India Answers: Share what you know. Learn something new
http://in.answers.yahoo.com/
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message