lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Floyd Wu <floyd...@gmail.com>
Subject Re: how to index 20 MB plain-text xml
Date Tue, 01 Apr 2014 00:30:30 GMT
Hi Upayavira,
User don't hit solr directly, the search documents through my application.
The application is a entrance for user to upload documents and then indexed
by solr.
the situation is they upload a plain-text, something like dictionary. You
know, that dictionary is something big.
I'm trying to figure out some good technique before I can split these xml
to small one and streaming to solr.

Floyd



2014-04-01 2:55 GMT+08:00 Upayavira <uv@odoko.co.uk>:

> Tell the user they can't have!
>
> Or, write a small app that reads in their XML in one go, and pushes it
> in parts to Solr. Generally, I'd say letting a user hit Solr directly is
> a bad thing - especially a user who doesn't know the details of how Solr
> works.
>
> Upayavira
>
> On Mon, Mar 31, 2014, at 07:17 AM, Floyd Wu wrote:
> > Hi Alex,
> >
> > Thanks for your responding. Personally I don't want to feed these big xml
> > to solr. But users wants.
> > I'll try your suggestions later.
> >
> > Many thanks.
> >
> > Floyd
> >
> >
> >
> > 2014-03-31 13:44 GMT+08:00 Alexandre Rafalovitch <arafalov@gmail.com>:
> >
> > > Without digging too deep into why exactly this is happening, here are
> > > the general options:
> > >
> > > 0. Are you actually committing? Check the messages in the logs and see
> > > if the records show up when you expect them too.
> > > 1. Are you actually trying to feed 20Mb file to Solr? Maybe it's HTTP
> > > buffer that's blowing up? Try using stream.file instead (notice
> > > security warning though): http://wiki.apache.org/solr/ContentStream
> > > 2. Split file into smaller ones and and commit each separately
> > > 3. Set hard auto-commit in solrconfig.xml based on number of documents
> > > to flush in-memory structures to disk
> > > 4. Switch to using DataImportHandler to pull from XML instead of
> pushing
> > > 5. Increase amount of memory to Solr (-X command line flags)
> > >
> > > Regards,
> > >    Alex.
> > >
> > > Personal website: http://www.outerthoughts.com/
> > > Current project: http://www.solr-start.com/ - Accelerating your Solr
> > > proficiency
> > >
> > > On Mon, Mar 31, 2014 at 12:00 PM, Floyd Wu <floyd.wu@gmail.com> wrote:
> > > > I have many plain text xml that I transfer to form of solr xml
> format.
> > > > But every time I send them to solr, I hit OOM exception.
> > > > How to configure solr to "eat" these big xml?
> > > > Please guide me a way. Thanks
> > > >
> > > > floyd
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message