lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Floyd Wu <floyd...@gmail.com>
Subject Re: how to index 20 MB plain-text xml
Date Mon, 31 Mar 2014 06:17:19 GMT
Hi Alex,

Thanks for your responding. Personally I don't want to feed these big xml
to solr. But users wants.
I'll try your suggestions later.

Many thanks.

Floyd



2014-03-31 13:44 GMT+08:00 Alexandre Rafalovitch <arafalov@gmail.com>:

> Without digging too deep into why exactly this is happening, here are
> the general options:
>
> 0. Are you actually committing? Check the messages in the logs and see
> if the records show up when you expect them too.
> 1. Are you actually trying to feed 20Mb file to Solr? Maybe it's HTTP
> buffer that's blowing up? Try using stream.file instead (notice
> security warning though): http://wiki.apache.org/solr/ContentStream
> 2. Split file into smaller ones and and commit each separately
> 3. Set hard auto-commit in solrconfig.xml based on number of documents
> to flush in-memory structures to disk
> 4. Switch to using DataImportHandler to pull from XML instead of pushing
> 5. Increase amount of memory to Solr (-X command line flags)
>
> Regards,
>    Alex.
>
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr
> proficiency
>
> On Mon, Mar 31, 2014 at 12:00 PM, Floyd Wu <floyd.wu@gmail.com> wrote:
> > I have many plain text xml that I transfer to form of solr xml format.
> > But every time I send them to solr, I hit OOM exception.
> > How to configure solr to "eat" these big xml?
> > Please guide me a way. Thanks
> >
> > floyd
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message