lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Vdd <>
Subject Continue committing after out of memory of contrib library. (tika)
Date Tue, 14 May 2013 14:57:58 GMT
I'm using a combination of tika and custom code to extract text from files.
(with solrj)
I was looking at the amount of files I had in my index and noticed many of
them where missing.
Then I went to the solradmin panel and noticed this in the logfiles:

        java.lang.IllegalStateException: this writer hit an
OutOfMemoryError; cannot commit

         null:java.lang.IllegalStateException: this writer hit an
OutOfMemoryError; cannot commit

        auto commit error...:java.lang.IllegalStateException: this writer
hit an OutOfMemoryError; cannot commit

After this all the uploads to tika seem to fail.(internal server error 500).

This is the code I use to upload stuff with Tika:

    SolrServer solr;
    public void IndexFile(File fileToIndex) throws IOException,
SolrServerException {
        ContentStreamUpdateRequest up = new
        up.addFile(fileToIndex, "application/octet-stream");
        up.setParam("literal.filename", fileToIndex.getName());
        up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);

Is there a way to skip the file that caused the out of memory and then
*continue extracting/indexing*. I don't know how to do this in SolrJ. 

All the files I uploaded manually kept working. (because I index each page
of a pdf seperatly using pdfbox)
Only those who used tika gave Exceptions and didn't commit.

I know I could've increased memory parameters but some Excel files fail to
extract even with 16Gb memory assigned. I've tested it with the tika

View this message in context:
Sent from the Solr - User mailing list archive at

View raw message