lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arunan Sugunakumar <arunans...@cse.mrt.ac.lk>
Subject Re: Good practices on indexing larger amount of documents at once using SolrJ
Date Tue, 24 Jul 2018 16:55:32 GMT
Dear Erick,

Unfortunately I deleted the original Solr logs, so I couldn't post it here.
But removing the hard commit from the loop solved my problem and made
indexing faster. Now there are no errors thrown from the client side.

Thanks
Arunan


On 22 July 2018 at 04:45, Erick Erickson <erickerickson@gmail.com> wrote:

> commitWithin parameter.
>
> Well, what I usually do is set my autocommit interval in my
> solrconfig.xml file and forget about it.
> For searching, set your autosoftcommit in solrconfig.xml and forget
> about _that_.
>
> Here's more than you want to know about the topic.
> https://lucidworks.com/2013/08/23/understanding-
> transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> As for what to set them to? soft commit "as long as you can stand".
>
> For hard commit (openSearcher value doesn't really matter) I like a
> minute or so. Especially if openSearcher=false,
> then that defines the limit of how much data you'd have to replay from
> the tlog if your process terminates
> abnormally.
>
> But for your original problem, what do the solr logs say? The error
> you posted doesn't really shed any light on the root cause.
>
> Best,
> Erick
>
> On Fri, Jul 20, 2018 at 9:39 PM, Arunan Sugunakumar
> <arunans.14@cse.mrt.ac.lk> wrote:
> > Dear Erick,
> >
> > Thank you for your reply. I initialize the arraylist variable with a new
> > Array List after I add and commit the solrDocumentList into the
> solrClient.
> > So I dont think I have the problem of ever increasing ArrayList. (I hope
> > the add method in solrClient flushes the previous documents added). But
> as
> > you said I do a hard commit during the loop. I can change it by adding
> > commitWithin. What is the value you would recommend for this type of
> > scenario.
> >
> > Thank you,
> > Arunan
> >
> > *Sugunakumar Arunan*
> > Undergraduate - CSE | UOM
> >
> > Email : aruna <arunan@wso2.com>ns.14@cse.mrt.ac.lk
> > Mobile : 0094 766016272 <076%20601%206272>
> > LinkedIn : https://www.linkedin.com/in/arunans23/
> >
> > On 20 July 2018 at 23:21, Erick Erickson <erickerickson@gmail.com>
> wrote:
> >
> >> I do this all the time with batches of 1,000 and don't see this problem.
> >>
> >> one thing that sometimes bites people is to fail to clear the doclist
> >> after every call to add. So you send ever-increasing batches to Solr.
> >> Assuming when you talk about batch size meaning the size of the
> >> solrDocunentList, increasing it would make  the broken pipe problem
> >> worse if anything...
> >>
> >> Also, it's generally bad practice to commit after every batch. That's
> not
> >> your problem here, just something to note. Let your autocommit
> >> settings in solrconfig handle it or specify commitWithin in your
> >> add call.
> >>
> >> I'd also look in your Solr logs and see if there's a problem there.
> >>
> >> Net-net is this is a perfectly reasonable pattern, I suspect some
> >> innocent-seeming problem with your indexing code.
> >>
> >> Best,
> >> Erick
> >>
> >>
> >>
> >> On Fri, Jul 20, 2018 at 9:32 AM, Arunan Sugunakumar
> >> <arunans.14@cse.mrt.ac.lk> wrote:
> >> > Hi,
> >> >
> >> > I have around 12 millions objects in my PostgreSQL database to be
> >> indexed.
> >> > I'm running a thread to fetch the rows from the database. The thread
> will
> >> > also create the documents and put it in an indexing queue. While this
> is
> >> > happening my main process will retrieve the documents from the queue
> and
> >> > will index it in the size of 1000. For some time the process is
> running
> >> as
> >> > expected, but after some time, I get an exception.
> >> >
> >> > *[corePostProcess] org.apache.solr.client.solrj.SolrServerException:
> >> > IOException occured when talking to server at:
> >> > http://localhost:8983/solr/mine-search
> >> > <http://localhost:8983/solr/intermine-search>…………………………….…
> >> ………………………….[corePostProcess]
> >> > Caused by: java.net.SocketException: Broken pipe (Write
> >> > failed)[corePostProcess]    at
> >> > java.net.SocketOutputStream.socketWrite0(Native Method)*
> >> >
> >> >
> >> > I tried increasing the batch size upto 30000. Then I got a different
> >> > exception.
> >> >
> >> > *[corePostProcess] org.apache.solr.client.solrj.SolrServerException:
> >> > IOException occured when talking to server at:
> >> > http://localhost:8983/solr/mine-search
> >> > <http://localhost:8983/solr/mine-search>………………………………………………
> >> .…………………………………………….[corePostProcess]
> >> > Caused by: org.apache.http.NoHttpResponseException: localhost:8983
> >> failed
> >> > to respond*
> >> >
> >> >
> >> > I would like to know whether there are any good practices on handling
> >> such
> >> > situation, such as max no of documents to index in one attempt etc.
> >> >
> >> > My environement :
> >> >
> >> > Version : solr 7.2, solrj 7.2
> >> > Ubuntu 16.04
> >> > RAM 20GB
> >> > I started Solr in standalone mode.
> >> > Number of replicas and shards : 1
> >> >
> >> > The method I used :
> >> >                 UpdateResponse response = solrClient.add(
> >> solrDocumentList);
> >> >                 solrClient.commit();
> >> >
> >> >
> >> > Thanks in advance.
> >> >
> >> > Arunan
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message