lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Floyd Wu <floyd...@gmail.com>
Subject Re: What is the best approach to send lots of XML Messages to Solr to build index?
Date Mon, 16 Jun 2014 03:02:55 GMT
Hi Erick, Thanks for your advice. autoCommit is configured 30 sec in my
environment.
i'm using C# to develop main system and Solr as a service, so using SolrJ
would consider as impossible(for now).
I;m seeking the better way to directly input(import) the offline generated
XML to build index.
Currently i'm using my own C# code to send these xml files one by one
through HTTP but result poor performance. (parallel will hit OOM or
generate lots of tlog files).

Actually a main question is "what is the best(better) way to rebuild whole
index from scratch".

Floyd





2014-06-15 23:59 GMT+08:00 Erick Erickson <erickerickson@gmail.com>:

> A couple of things:
>
> > Consider indexing them with SolrJ, here's a place to get started:
> http://searchhub.org/2012/02/14/indexing-with-solrj/. Especially if you
> use a SAX-based parser you have more control over memory consumption, it's
> on the client after all. And, you can rack together as many clients all
> going to Solr as you need.
>
> > Here's a bunch of information about tlogs and commits that might be
> useful background.
>
> http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> .
> Consider setting your <autoCommit> interval quite short (15 seconds)
> with openSearcher set to false. That'll truncate your tlog, although
> how that relates to your error is something of a mystery to me...
>
> Best,
> Erick
>
> On Sun, Jun 15, 2014 at 3:14 AM, Mikhail Khludnev
> <mkhludnev@griddynamics.com> wrote:
> > Hello Floyd,
> >
> > Did you consider to disable tlog?
> > Does a file consist of many docs?
> > Do you have SolrCloud? Do you use just sh/curl or have a java program?
> > DIH is not really performant so far. Submitting roughly ten huge files in
> > parallel is a way to perform good. Once again, nuke tlog.
> >
> >
> > On Sun, Jun 15, 2014 at 12:44 PM, Floyd Wu <floyd.wu@gmail.com> wrote:
> >
> >> Hi,
> >> I have many XML Message file formatted like this
> >> https://wiki.apache.org/solr/UpdateXmlMessages
> >>
> >> These files are generated by my index builder daily.
> >> Currently I am sending these file through http post to Solr but
> sometimes I
> >> hit OOM exception or pending too many tlog.
> >>
> >> Do you have better way to "import" these files to Solr to build index?
> >>
> >> Thanks for the suggestion
> >>
> >> Floyd
> >>
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > <http://www.griddynamics.com>
> >  <mkhludnev@griddynamics.com>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message