lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tedsolr <>
Subject Re: Specify sorting of merged streams
Date Thu, 21 Jul 2016 15:08:18 GMT
I can see I may need to rethink some things. I have two joins: one is 1 to 1
(very large) and one is 1 to .03. A HashJoin may work on the smaller one.
The large join looks like it may not be possible. I could get away with
treating it as a filter somehow - I don't need the fields from the
documents. Such as ... include col1 document (id=123) if col2 contains
document with id=123.

This whole chain is a real-time user search. A 1-2 sec response would be
ideal, but I'm sacrificing speed in order to get the reindexing to run much

Concurrency is low - like a dozen. Have you read any blogs on balancing #
shards vs # replicas? Any guidelines on estimating the number of VMs this
may require would be great.

Joel Bernstein wrote
> A few other things for you to consider:
> 1) How big are the joins?
> 2) How fast do they need to go?
> 3) How many queries need to run concurrently?
> #1 and 2# will dictate how many shards, replicas and parallel workers are
> needed to perform the join. #3 needs to be carefully considered because
> MapReduce distributed joins are not going to scale like traditional Solr
> queries.

View this message in context:
Sent from the Solr - User mailing list archive at

View raw message