lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emir Arnautović <emir.arnauto...@sematext.com>
Subject Re: Unbalanced CPU no SolrCloud
Date Mon, 16 Oct 2017 15:29:55 GMT
Does the load stops when you stop indexing or it last for some more time? Is it always one
node that behaves like this and it starts as soon as you start indexing? Is load different
between nodes when you are doing lighter indexing?

--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 16 Oct 2017, at 13:35, Mahmoud Almokadem <prog.mahmoud@gmail.com> wrote:
> 
> The transition of the load happened after I restarted the bulk insert
> process.
> 
> The size of the index on each server about 500GB.
> 
> There are about 8 warnings on each server for "Not found segment file" like
> that
> 
> Error getting file length for [segments_2s4]
> 
> java.nio.file.NoSuchFileException:
> /media/ssd_losedata/solr-home/data/documents_online_shard16_replica_n1/data/index/segments_2s4
> at
> java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
> at
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
> at
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
> at
> java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
> at
> java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:145)
> at
> java.base/sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
> at java.base/java.nio.file.Files.readAttributes(Files.java:1755)
> at java.base/java.nio.file.Files.size(Files.java:2369)
> at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243)
> at
> org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:128)
> at
> org.apache.solr.handler.admin.LukeRequestHandler.getFileLength(LukeRequestHandler.java:611)
> at
> org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:584)
> at
> org.apache.solr.handler.admin.LukeRequestHandler.handleRequestBody(LukeRequestHandler.java:136)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2474)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:720)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:526)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:378)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:322)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
> at
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
> at org.eclipse.jetty.server.Server.handle(Server.java:534)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
> at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> at
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
> at java.base/java.lang.Thread.run(Thread.java:844)
> 
> On Mon, Oct 16, 2017 at 1:08 PM, Emir Arnautović <
> emir.arnautovic@sematext.com> wrote:
> 
>> I did not look at graph details - now I see that it is over 3h time span.
>> It seems that there was a load on the other server before this one and
>> ended with 14GB read spike and 10GB write spike, just before load started
>> on this server. Do you see any errors or suspicious logs lines?
>> How big is your index?
>> 
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 16 Oct 2017, at 12:39, Mahmoud Almokadem <prog.mahmoud@gmail.com>
>> wrote:
>>> 
>>> Yes, it's constantly since I started this bulk indexing process.
>>> As you see the write operations on the loaded server are 3x the normal
>>> server despite Disk writes not 3x times.
>>> 
>>> Mahmoud
>>> 
>>> 
>>> On Mon, Oct 16, 2017 at 12:32 PM, Emir Arnautović <
>>> emir.arnautovic@sematext.com> wrote:
>>> 
>>>> Hi Mahmoud,
>>>> Is this something that you see constantly? Network charts suggests that
>>>> your servers are loaded equally and as you said - you are not using
>> routing
>>>> so expected. Disk read/write and CPU are not equal and it is expected to
>>>> not be equal during heavy indexing since it also triggers segment merges
>>>> which require those resources. Even if host same documents (e.g. leader
>> and
>>>> replica) merges are not likely to happen at the same time and you can
>>>> expect to see such cases.
>>>> 
>>>> Thanks,
>>>> Emir
>>>> --
>>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>>> 
>>>> 
>>>> 
>>>>> On 16 Oct 2017, at 11:58, Mahmoud Almokadem <prog.mahmoud@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Here are the screen shots for the two server metrics on Amazon
>>>>> 
>>>>> https://ibb.co/kxBQam
>>>>> https://ibb.co/fn0Jvm
>>>>> https://ibb.co/kUpYT6
>>>>> 
>>>>> 
>>>>> 
>>>>> On Mon, Oct 16, 2017 at 11:37 AM, Mahmoud Almokadem <
>>>> prog.mahmoud@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Hi Emir,
>>>>>> 
>>>>>> We doesn't use routing.
>>>>>> 
>>>>>> Servers is already balanced and the number of documents on each shard
>>>> are
>>>>>> approximately the same.
>>>>>> 
>>>>>> Nothing running on the servers except Solr and ZooKeeper.
>>>>>> 
>>>>>> I initialized the client as
>>>>>> 
>>>>>> String zkHost = "192.168.1.89:2181,192.168.1.99:2181";
>>>>>> 
>>>>>> CloudSolrClient solrCloud = new CloudSolrClient.Builder()
>>>>>>                  .withZkHost(zkHost)
>>>>>>                  .build();
>>>>>> 
>>>>>>          solrCloud.setIdField("document_id");
>>>>>>          solrCloud.setDefaultCollection(collection);
>>>>>>          solrCloud.setRequestWriter(new BinaryRequestWriter());
>>>>>> 
>>>>>> 
>>>>>> And the documents are approximately the same size.
>>>>>> 
>>>>>> I Used 10 threads with 10 SolrClients to send data to solr and every
>>>>>> thread send a batch of 1000 documents every time.
>>>>>> 
>>>>>> Thanks,
>>>>>> Mahmoud
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Mon, Oct 16, 2017 at 11:01 AM, Emir Arnautović <
>>>>>> emir.arnautovic@sematext.com> wrote:
>>>>>> 
>>>>>>> Hi Mahmoud,
>>>>>>> Do you use routing? Are your servers equally balanced - do you
end up
>>>>>>> having approximately the same number of documents hosted on both
>>>> servers
>>>>>>> (counted all shards)?
>>>>>>> Do you have anything else running on those servers?
>>>>>>> How do you initialise your SolrJ client?
>>>>>>> Are documents of similar size?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Emir
>>>>>>> --
>>>>>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>>>>>> Solr & Elasticsearch Consulting Support Training -
>>>> http://sematext.com/
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On 16 Oct 2017, at 10:46, Mahmoud Almokadem <prog.mahmoud@gmail.com
>>> 
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> We've installed SolrCloud 7.0.1 with two nodes and 8 shards
per
>> node.
>>>>>>>> 
>>>>>>>> The configurations and the specs of the two servers are identical.
>>>>>>>> 
>>>>>>>> When running bulk indexing using SolrJ we see one of the
servers is
>>>>>>> fully
>>>>>>>> loaded as you see on the images and the other is normal.
>>>>>>>> 
>>>>>>>> Images URLs:
>>>>>>>> 
>>>>>>>> https://ibb.co/jkE6gR
>>>>>>>> https://ibb.co/hyzvam
>>>>>>>> https://ibb.co/mUpvam
>>>>>>>> https://ibb.co/e4bxo6
>>>>>>>> 
>>>>>>>> How can I figure this issue?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Mahmoud
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 


Mime
View raw message