lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Outar" <rou...@ideorlando.org>
Subject OutOfMemoryException while Indexing an XML file
Date Fri, 14 Feb 2003 13:13:23 GMT
Hi all,

	I was using the sample code provided I believe by Doug Cutting to index an
XML file, the XML file was 2 megs (kinda large) but while adding fields to
the Document object I got an OutOfMemoryException exception.  I work with
XML files a lot, I can easily parse that 2 meg file into a DOM tree, I can't
imagine a Lucene document being larger than a DOM Tree, pasted below is the
SAX handler.

public class XMLDocumentBuilder
extends DefaultHandler {

    /** A buffer for each XML element */
    private StringBuffer elementBuffer = new StringBuffer();

    private Document mDocument;


    public void buildDocument(Document doc, String xmlFile) throws
IOException,
    SAXException {

        this.mDocument = doc;
        SAXReader.parse(xmlFile, this);
    }

    public void startElement(String uri, String localName, String qName,
    Attributes atts) {

        elementBuffer.setLength(0);

        if (atts != null) {

            for (int i = 0; i < atts.getLength(); i++) {

                String attname = atts.getLocalName(i);
                mDocument.add(new Field(attname, atts.getValue(i),
                true, true, true));
            }
        }
    }

    // call when cdata found
    public void characters(char[] text, int start, int length) {
        elementBuffer.append(text, start, length);
    }

    public void endElement(String uri, String localName, String qName) {
        mDocument.add(Field.Text(localName, elementBuffer.toString()));
    }
    public Document getDocument() {
        return mDocument;
    }
}

Any help would be appreciated.

Thanks,

Rob


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message