lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From MitchK <>
Subject Re: Distributed Search Components
Date Mon, 21 Jun 2010 10:11:12 GMT


> I honestly don't know what you mean by "those implementations" and "both 
> implementations" ... impls of what? 
I mean the implementation of the distributed search in Solr. Those classes
that are responsible for the search-logic. I mean, from somewhere the
searcher (or whatever) must get the knowledge about which shards exists,
which of them to query and what their adresses are. 
I want to learn more about the class, that manages this logic. Unfortunately
I don't know which class it is.

With "those" implementations I mean "MultiSearcher" and "solr's
implementation of distributed search".

Were my words not clear enough or used I the wrong vocabulary?

> Let's use a concrete example ... imagine you are only dealing with 
> QueryComponent and FacetComponent, and imagine we have a single 
> "coordinatorX" server that we query, and it distributes to two distinct 
> shard servers ("shardA" and "shardB") 
> the first thing QueryComponent on the coordinatorX server cares about is 
> asking shardA and shardB for the docIds of the docs they have that match.   
> the first thing FacetComponent on coordinatorX cares about is knowing the 
> top facet constraints for the matching docs from shardA and shardB -- both 
> of those pieces of information can be computed in a single request to each 
> shard, in which the shard computes both pieces of information (it's top 
> scoring documents and it's facet constraints with the highest counts) in a 
> single pass.  When coordinatorX gets those responses back, it's 
> QueryComponent can sort the "score,docId,shard" tuples to decide which 
> shards it needs to ask for the stored fields of which docIds in order to 
> build the final list of matching docs; and coordinatorX's FacetComponent 
> can sort the "constraint,sum(shardCounts)" to decide which constraints 
> should be in the final response, but since a constraint in that list 
> because it had a highcount from shardB might not have been in the initial 
> list from shardA, it needs to ask for the final count from shardA. 
> These subsequent pieces of info for both the QueryComponent and the 
> FacetComponent can be fetched from each shard in another single request, 
> and although they may not be computed in a single pass, we still only have 
> hte overhead of one network request instead of two or more. 
> On the otherhand, if coordinatorX just dela with shardA and shardB using 
> an abstractiong at the Searcher level using something like MultiSearcher, 
> then things like distributed faceting would require a *huge* amount of 
> network IO as things like using the TermEnums and TermDocs on coordinatorX 
> would result in all of that data being streamed from the individual 
> (remote) searchers for each shard so the coordinator could execute the 
> neccessary counting logic. 

I honestly thought that the MultiSearcher would exactly do what you
described here. What a missunderstanding of mine.
Thanks for the clearification, Hoss.

Kind regards
- Mitch
View this message in context:
Sent from the Solr - Dev mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message