lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tomasv <>
Subject shards as subset of All Shards
Date Sat, 19 Jul 2014 00:18:31 GMT
Hello, This is kind of weird, but here goes:

We are setting up a document repository (SOLR4). This will be a large (to
us) repository of approximately 500B documents. The documents are based on

Once all my documents are uploaded, we will receive new (follow-up)
information on our "people" every month (or so).

Our client facing application has two modes "all inclusive data" or "recent
We want the "recent data" mode to query against the data in the follow-up
information only. We want the "all inclusive" mode to query against the
initial load AND the follow-up data.

We currently have 30 shards with 2 replicas of each shard (60 shards total)
in a SOLR cloud setup including a Zookeeper. This is currently hosting our
data in what will become the "all inclusive" query.

What is the best approach to to a requirement such as this? (Probably not
celar enough??)
(I'm a newbie so please bear with my questions! :-)  )
1. Should we create two separate collections ("initial" and "followup")? And
then have the front end app query against each collection as needed?
2. Is it possible to index the follow-up records to specific shards and then
query those specific shards when the client is in "follow up" mode? Will a
"all inclusive" include the followup shards?
3. Is it possible for one collection to be a subset of a larger collection?

I realize this is quite "fuzzy", but any insights are appreciated.


View this message in context:
Sent from the Solr - User mailing list archive at

View raw message