lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Troullis <>
Subject Re: Seeing odd behavior with implicit routing
Date Tue, 16 May 2017 12:08:31 GMT

Thanks for the response and explanation! I logged a JIRA per your request


On Mon, May 15, 2017 at 3:40 AM, Shalin Shekhar Mangar <> wrote:

> On Sun, May 14, 2017 at 7:40 PM, Chris Troullis <>
> wrote:
> > Hi,
> >
> > I've been experimenting with various sharding strategies with Solr cloud
> > (6.5.1), and am seeing some odd behavior when using the implicit router.
> I
> > am probably either doing something wrong or misinterpreting what I am
> > seeing in the logs, but if someone could help clarify that would be
> awesome.
> >
> > I created a collection using the implicit router, created 10 shards,
> named
> > shard1, shard2, etc. I indexed 3000 documents to each shard, routed by
> > setting the _route_ field on the documents in my schema. All works fine,
> I
> > verified there are 3000 documents in each shard.
> >
> > The odd behavior I am seeing is when I try to route a query to a specific
> > shard. I submitted a simple query to shard1 using the request parameter
> > _route_=shard1. The query comes back fine, but when I looked in the logs,
> > it looked like it was issuing 3 separate requests:
> >
> > 1. The original query to shard1
> > 2. A 2nd query to shard1 with the parameter ids=a bunch of document ids
> > 3. The original query to a random shard (changes every time I run the
> query)
> >
> > It looks like the first query is getting back a list of ids, and the 2nd
> > query is retrieving the documents for those ids? I assume this is some
> solr
> > cloud implementation detail.
> >
> > What I don't understand is the 3rd query. Why is it issuing the original
> > query to a random shard every time, when I am specifying the _route_? The
> > _route_ parameter is definitely doing something, because if I remove it,
> it
> > is querying all shards (which I would expect).
> >
> > Any ideas? I can provide the actual queries from the logs if required.
> How many nodes is this collection distributed across? I suspect that
> you are using a single node for experimentation?
> What happens with _route_=shard1 parameter and implicit routing is
> that the _route_ parameter is resolved to a list of replicas of
> shard1. But, SolrJ uses only the node name of the replica along with
> the collection name to make the request (this is important, we'll come
> back to this later). So, ordinarily, that node hosts a single shard
> (shard1) and when it receives the request, it will optimize the search
> to go the non-distributed code path (since the replica has all the
> data needed to satisfy the search).
> But interesting things happen when the node hosts more than one shard
> (say shard1 and shard3 both). When we query such a node using just the
> collection name, the collection name can be resolved to either shard1
> or shard3 -- this is picked randomly without looking at _route_
> parameter at all. If shard3 is picked, it looks at the request, sees
> that it doesn't have all the necessary data and decides to follow the
> two-phase distributed search path where phase 1 is to get the ids and
> score of the documents matching the query from all participating
> shards (the list of such shards is limited by _route_ parameter, which
> in our case will be only shard1) and a second phase where we get the
> actual stored fields to be returned to the user. So you get three
> queries in the log, 1) phase 1 of distributed search hitting shard1,
> 2) phase two of distributed search hitting shard1 and 3) the
> distributed scatter-gather search run by shard3.
> So to recap, this is happening because you have more than one shard1
> hosted on a node. Easy workaround is to have each shard hosted on a
> unique node. But we can improve things on the solr side as well by 1)
> having SolrJ resolve requests down to node name and core name, 2)
> having the collection name to core name resolution take _route_ param
> into account. Both 1 and 2 can solve the problem. Can you please open
> a Jira issue?
> >
> > Thanks,
> >
> > Chris
> --
> Regards,
> Shalin Shekhar Mangar.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message