lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: SOLR-11504: Provide a config to restrict number of indexing threads
Date Thu, 02 Nov 2017 10:26:22 GMT
Actually, it's one lucene segment per *concurrent* indexing thread.

So if you have 10 indexing threads in Lucene at once, then 10 in-memory
segments will be created and will have to be written on refresh/commit.

Elasticsearch uses a bounded thread pool to service all indexing requests,
which I think is a healthy approach.  It shouldn't have to be the client's
job to worry about server side details like this.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Nov 2, 2017 at 5:23 AM, Emir Arnautović <
emir.arnautovic@sematext.com> wrote:

> Hi Nawab,
>
> > One indexing thread in lucene  corresponds to one segment being written.
> I need a fine control on the number of segments.
>
> I didn’t check the code, but I would be surprised that it is how things
> work. It can appear that it is working like that if each client thread is
> doing commits. Is that the case?
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 1 Nov 2017, at 18:00, Nawab Zada Asad Iqbal <khichi@gmail.com> wrote:
> >
> > Well, the reason i want to control number of indexing threads is to
> > restrict number of "segments" being created at one time in the RAM. One
> > indexing thread in lucene  corresponds to one segment being written. I
> need
> > a fine control on the number of segments. Less than that, and I will not
> be
> > fully utilizing my writing capacity. On the other hand, if I have more
> > threads, then I will end up a lot more segments of small size, which I
> will
> > need to flush frequently and then merge, and that will cause a different
> > kind of problem.
> >
> > Your suggestion will require me and other such solr users to create a
> tight
> > coupling between the clients and the Solr servers. My client is not SolrJ
> > based. IN a scenario when I am connecting and indexing to Solr remotely,
> I
> > want more requests to be waiting on the solr side so that they start
> > writing as soon as an Indexing thread is available, vs waiting on my
> client
> > side - on the other side of the wire.
> >
> > Thanks
> > Nawab
> >
> > On Wed, Nov 1, 2017 at 7:11 AM, Shawn Heisey <apache@elyograg.org>
> wrote:
> >
> >> On 10/31/2017 4:57 PM, Nawab Zada Asad Iqbal wrote:
> >>
> >>> I hit this issue https://issues.apache.org/jira/browse/SOLR-11504
> while
> >>> migrating to solr6 and locally working around it in Lucene code. I am
> >>> thinking to fix it properly and hopefully patch back to Solr. Since,
> >>> Lucene
> >>> code does not want to keep any such config, I am thinking to use a
> >>> counting
> >>> semaphore in Solr code before calling IndexWriter.addDocument(s) or
> >>> IndexWriter.updateDocument(s).
> >>>
> >>
> >> There's a fairly simple way to control the number of indexing threads
> that
> >> doesn't require ANY changes to Solr:  Don't start as many
> threads/processes
> >> on your indexing client(s).  If you control the number of simultaneous
> >> requests sent to Solr, then Solr won't start as many indexing threads.
> >> That kind of control over your indexing system is something that's
> always
> >> preferable to have.
> >>
> >> Thanks,
> >> Shawn
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message