lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: how to index 20 MB plain-text xml
Date Mon, 31 Mar 2014 06:47:59 GMT

I had the same issue with XML files. Even small XML files produced OOM 
exception. I read that the way XMLs are parsed can sometimes blow up 
memory requirements to such values that java runs out of heap. My solution 

1. Don't parse XML files
2. Parse only small XML files and hope for the best
3. Give Solr the largest possible amount of java heap size (and hope for 
the best)

But then again, one time I also got OOM exception with Word documents - it 
turned out that some user had pasted 400 MB worth of photos into a Word 



From:   Floyd Wu <>
Date:   31.03.2014 08:18
Subject:        Re: how to index 20 MB plain-text xml

Hi Alex,

Thanks for your responding. Personally I don't want to feed these big xml
to solr. But users wants.
I'll try your suggestions later.

Many thanks.


2014-03-31 13:44 GMT+08:00 Alexandre Rafalovitch <>:

> Without digging too deep into why exactly this is happening, here are
> the general options:
> 0. Are you actually committing? Check the messages in the logs and see
> if the records show up when you expect them too.
> 1. Are you actually trying to feed 20Mb file to Solr? Maybe it's HTTP
> buffer that's blowing up? Try using stream.file instead (notice
> security warning though):
> 2. Split file into smaller ones and and commit each separately
> 3. Set hard auto-commit in solrconfig.xml based on number of documents
> to flush in-memory structures to disk
> 4. Switch to using DataImportHandler to pull from XML instead of pushing
> 5. Increase amount of memory to Solr (-X command line flags)
> Regards,
>    Alex.
> Personal website:
> Current project: - Accelerating your Solr
> proficiency
> On Mon, Mar 31, 2014 at 12:00 PM, Floyd Wu <> wrote:
> > I have many plain text xml that I transfer to form of solr xml format.
> > But every time I send them to solr, I hit OOM exception.
> > How to configure solr to "eat" these big xml?
> > Please guide me a way. Thanks
> >
> > floyd

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message