lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hassan <>
Subject Re: SolrCloud and Join Queries
Date Sat, 05 Jan 2013 10:55:23 GMT
Thanks Per and Otis,

It is much clearer now but I have a question about adding new solr nodes 
and collections.
I have a dedicated zookeeper instance. Lets say I have uploaded my 
configuration to zookeeper using "zkcli" and named it, say, 
Now I want to create a new solrcloud from scratch with two solr nodes. I 
need to create a new collection (with one shard) called "customer1" 
using the configuration name "configuration1". I have tried different 
ways using Collections API, zkcli linkconfig/downconfig but I cannot get 
it to work. Collection is only available on one node. The example 
"collection1" works as expected where one node has the Leader shard and 
the other node has the replica. See the cloud graph

What is the correct way to dynamically add collections to already 
existing nodes and new nodes?

Thanks you,
On 05/01/13 09:07, Otis Gospodnetic wrote:
> Hi,
> I think things will work for Hassan as he described them.  The key is not
> to shard in his case, that's all.
> Hassan, yes, 1-2M docs is small. But beware of creating a crazy
> number (e.g. thousands) of collections per server, as each collection has
> some cost.
> Otis
> --
> Solr & ElasticSearch Support
> On Fri, Jan 4, 2013 at 5:28 AM, Per Steffensen <> wrote:
>> On 1/4/13 9:21 AM, Hassan wrote:
>>> Hi,
>>> I am considering SolrCloud for our applications but I have run into the
>>> limitation of not being able to use Join Queries in distributed searches.
>>> Our requirements are the following:
>>> - SolrCloud will serve many applications where each application "index"
>>> is separate from other application. Each application really is customer
>>> deployment and we need to isolate customers data from each other
>>> -Join queries are required. Queries will only look at one customer at a
>>> time.
>>> - Since data volume for each customer is small in Solr/Lucene standards,
>>> (1-2 Million document is small, right?
>> Yes
>>   ), we are really interested in the replication aspect of SolrCloud more
>>> than distributed search.
>>> I am considering the following SolrCloud design with questions:
>>> - Start SolrCloud with 1 shard only. This should allow join queries to
>>> work correctly since all documents will be available in the same shard
>>> (index). is this a correct assumption?
>>> - Each customer will have its own collection in the SolrCloud.
>> You cant have only one shard and several collections. A collections
>> consists of a number of shards, but a shards "belong" to a collection, so
>> two different collections do not use the same shard. Shard is "below"
>> collection in the concept-hierarchy so to speak.
>>   Do collections provide me with data isolation between customers?
>> Yes?
>> Depends on what you mean with "isolation". Since different collections
>> enforce different shards, and each shard basically has its own lucene index
>> (set of lucene indices if you use replication), and distinct lucene indices
>> typically persist in different disk-folders, you will get "isolation" of
>> data in the way that data for different customers will be stored in
>> different disk-folders.
>>   - Adding more nodes as replicas of the single shard to achieve
>>> replication and fault tolerance.
>>> Thank you,
>>> Hs
>> Not sure I understand completely what you want to achieve, but you might
>> want to have a collection per customer. One shard per collection = one
>> shard per customer = (as long as we do not consider replication) one lucene
>> index per customer = one data-disk-folder per customer. You should be able
>> to do join queries inside the specific customers shard.
>> Regards, Per Steffensen

View raw message