lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: What is the best approach to send lots of XML Messages to Solr to build index?
Date Sun, 15 Jun 2014 15:59:53 GMT
A couple of things:

> Consider indexing them with SolrJ, here's a place to get started: http://searchhub.org/2012/02/14/indexing-with-solrj/.
Especially if you use a SAX-based parser you have more control over memory consumption, it's
on the client after all. And, you can rack together as many clients all going to Solr as you
need.

> Here's a bunch of information about tlogs and commits that might be useful background.
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/.
Consider setting your <autoCommit> interval quite short (15 seconds)
with openSearcher set to false. That'll truncate your tlog, although
how that relates to your error is something of a mystery to me...

Best,
Erick

On Sun, Jun 15, 2014 at 3:14 AM, Mikhail Khludnev
<mkhludnev@griddynamics.com> wrote:
> Hello Floyd,
>
> Did you consider to disable tlog?
> Does a file consist of many docs?
> Do you have SolrCloud? Do you use just sh/curl or have a java program?
> DIH is not really performant so far. Submitting roughly ten huge files in
> parallel is a way to perform good. Once again, nuke tlog.
>
>
> On Sun, Jun 15, 2014 at 12:44 PM, Floyd Wu <floyd.wu@gmail.com> wrote:
>
>> Hi,
>> I have many XML Message file formatted like this
>> https://wiki.apache.org/solr/UpdateXmlMessages
>>
>> These files are generated by my index builder daily.
>> Currently I am sending these file through http post to Solr but sometimes I
>> hit OOM exception or pending too many tlog.
>>
>> Do you have better way to "import" these files to Solr to build index?
>>
>> Thanks for the suggestion
>>
>> Floyd
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mkhludnev@griddynamics.com>

Mime
View raw message