lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roger Lehmann <roger.lehm...@offerista.com>
Subject Re: Solr 7 not removing a node completely due to too small thread pool
Date Wed, 03 Apr 2019 11:16:17 GMT
Oh great, thanks for the hint!
I've upvoted this issue, since I think it might be worth to be able to
configure that (rather low) ThreadPool count.

On Wed, 3 Apr 2019 at 10:23, Shalin Shekhar Mangar <shalinmangar@gmail.com>
wrote:

> Thanks Roger. This was reported earlier but missed our attention.
>
> The issue is https://issues.apache.org/jira/browse/SOLR-11208
>
> On Tue, Apr 2, 2019 at 5:56 PM Roger Lehmann <roger.lehmann@offerista.com>
> wrote:
>
> > To be more specific: I currently have 19 collections, where each node has
> > exactly one replica per collection. A new node will automatically create
> > new replicas on itself, one for each existing collection (see
> > cluster-policy above).
> > So when removing a node, all 19 collection replicas of it need to be
> > removed. This can't be done in one go due to thread count (parallel
> > synchronous execution) being only 10 and is not scaling up when
> necessary.
> >
> > On Fri, 29 Mar 2019 at 14:20, Roger Lehmann <roger.lehmann@offerista.com
> >
> > wrote:
> >
> > > Situation
> > >
> > > I'm currently trying to set up SolrCloud in an AWS Autoscaling Group,
> so
> > > that it can scale dynamically.
> > >
> > > I've also added the following triggers to Solr, so that each node will
> > > have 1 (and only one) replication of each collection:
> > >
> > > {
> > > "set-cluster-policy": [
> > >   {"replica": "<2", "shard": "#EACH", "node": "#EACH"}
> > >   ],
> > >   "set-trigger": [{
> > >     "name": "node_added_trigger",
> > >     "event": "nodeAdded",
> > >     "waitFor": "5s",
> > >     "preferredOperation": "ADDREPLICA"
> > >   },{
> > >     "name": "node_lost_trigger",
> > >     "event": "nodeLost",
> > >     "waitFor": "120s",
> > >     "preferredOperation": "DELETENODE"
> > >   }]
> > > }
> > >
> > > This works pretty well. But my problem is that when the a node gets
> > > removed, it doesn't remove all 19 replicas from this node and I have
> > > problems when accessing the "nodes" page:
> > >
> > > [image: enter image description here]
> > > <https://i.stack.imgur.com/QyJrY.png>
> > >
> > > In the logs, this exception occurs:
> > >
> > > Operation deletenode
> > failed:java.util.concurrent.RejectedExecutionException: Task
> >
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$45/1104948431@467049e2
> > rejected from
> >
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@773563df
> [Running,
> > pool size = 10, active threads = 10, queued tasks = 0, completed tasks =
> 1]
> > >     at
> >
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
> > >     at
> >
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
> > >     at
> >
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
> > >     at
> >
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:194)
> > >     at
> >
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
> > >     at
> >
> org.apache.solr.cloud.api.collections.DeleteReplicaCmd.deleteCore(DeleteReplicaCmd.java:276)
> > >     at
> >
> org.apache.solr.cloud.api.collections.DeleteReplicaCmd.deleteReplica(DeleteReplicaCmd.java:95)
> > >     at
> >
> org.apache.solr.cloud.api.collections.DeleteNodeCmd.cleanupReplicas(DeleteNodeCmd.java:109)
> > >     at
> >
> org.apache.solr.cloud.api.collections.DeleteNodeCmd.call(DeleteNodeCmd.java:62)
> > >     at
> >
> org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:292)
> > >     at
> >
> org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:496)
> > >     at
> >
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
> > >     at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > >     at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > >     at java.lang.Thread.run(Thread.java:748)
> > >
> > > Problem description
> > >
> > > So, the problem is that it only has a pool size of 10, of which 10 are
> > > busy and nothing gets queued (synchronous execution). In fact, it
> really
> > > only removed 10 replicas and the other 9 replicas stayed there. When
> > > manually sending the API command to delete this node it works fine,
> since
> > > Solr only needs to remove the remaining 9 replicas and everything is
> good
> > > again.
> > > Question
> > >
> > > How can I either increase this (small) thread pool size and/or activate
> > > queueing the remaining deletion tasks? Another solution might be to
> retry
> > > the failed task until it succeeds.
> > >
> > > Using Solr 7.7.1 on Ubuntu Server installed with the installation
> script
> > > from Solr (so I guess it's using Jetty?).
> > >
> > > Thanks for your help!
> > >
> >
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


-- 

Roger Lehmann
Linux-System-Engineer

T: 0351-418 894 –76

*roger.lehmann@offerista.com
<roger.lehmann@offerista.com>**https://www.xing.com/profile/Roger_Lehmann8
<https://www.xing.com/profile/Roger_Lehmann8>*


* <https://www.offerista.com/>*__________________________________________

Offerista Group GmbH | Schützenplatz 14 | D - 01067 Dresden
Geschäftsführung: Tobias Bräuer, Benjamin Thym
Sitz Dresden | Amtsgericht Dresden | HRB 28678

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message