lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Distributed commits in CloudSolrServer
Date Tue, 15 Apr 2014 19:50:05 GMT
Inline responses below.
-- 
Mark Miller
about.me/markrmiller

On April 15, 2014 at 2:12:31 PM, Peter Keegan (peterlkeegan@gmail.com) wrote:

I have a SolrCloud index, 1 shard, with a leader and one replica, and 3 
ZKs. The Solr indexes are behind a load balancer. There is one 
CloudSolrServer client updating the indexes. The index schema includes 3 
ExternalFileFields. When the CloudSolrServer client issues a hard commit, I 
observe that the commits occur sequentially, not in parallel, on the leader 
and replica. The duration of each commit is about a minute. Most of this 
time is spent reloading the 3 ExternalFileField files. Because of the 
sequential commits, there is a period of time (1 minute+) when the index 
searchers will return different results, which can cause a bad user 
experience. This will get worse as replicas are added to handle 
auto-scaling. The goal is to keep all replicas in sync w.r.t. the user 
queries. 

My questions: 

1. Is there a reason that the distributed commits are done in sequence, not 
in parallel? Is there a way to change this behavior? 


The reason is that updates are currently done this way - it’s the only safe way to do it
without solving some more problems. I don’t think you can easily change this. I think we
should probably file a JIRA issue to track a better solution for commit handling. I think
there are some complications because of how commits can be added on update requests, but its
something we probably want to try and solve before tackling *all* updates to replicas in parallel
with the leader.



2. If instead, the commits were done in parallel by a separate client via a 
GET to each Solr instance, how would this client get the host/port values 
for each Solr instance from zookeeper? Are there any downsides to doing 
commits this way? 

Not really, other than the extra management.





Thanks, 
Peter 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message