lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Darren Govoni <>
Subject Re: basic solr cloud questions
Date Thu, 29 Sep 2011 15:11:40 GMT
Agree. Thanks also for clarifying. It helps.

On 09/29/2011 08:50 AM, Yury Kats wrote:
> On 9/29/2011 7:22 AM, Darren Govoni wrote:
>> That was kinda my point. The "new" cloud implementation
>> is not about replication, nor should it be. But rather about
>> horizontal scalability where "nodes" manage different parts
>> of a unified index.
> It;s about many things. You stated one, but there are goals,
> one of them being tolerance to node outages. In a cloud, when
> one of your many nodes fail, you don't want to stop querying and
> indexing. For this to happen, you need to maintain redundant copies
> of the same pieces of the index, hence you need to replicate.
>> One of the design goals of the "new" cloud
>> implementation is for this to happen more or less automatically.
> True, but there is a big gap between goals and current state.
> Right now, there is distributed search, but not distributed indexing
> or auto-sharding, or auto-replication. So if you want to use the SolrCloud
> now (as many of us do), you need do a number of things yourself,
> even if they might be done by SolrCloud automatically in the future.
>> To me that means one does not have to manually distributed
>> documents or enforce replication as Yurly suggests.
>> Replication is different to me than what was being asked.
>> And perhaps I misunderstood the original question.
>> Yurly's response introduced the term "core" where the original
>> person was referring to "nodes". For all I know, those are two
>> different things in the new cloud design terminology (I believe they are).
>> I guess understanding "cores" vs. "nodes" vs "shards" is helpful. :)
> Shard is a slice of index. Index is managed/stored in a core.
> Nodes are Solr instances, usually physical machines.
> Each node can host multiple shards, and each shard can consist of multiple cores.
> However, all cores within the same shard must have the same content.
> This is where the OP ran into the problem. The OP had 1 shard, consisting of two
> cores on two nodes. Since there is no distributed indexing yet, all documents were
> indexed into a single core. However, there is distributed search, therefore queries
> were sent randomly to different cores of the same shard. Since one core in the shard
> had documents and the other didn't, the query result was random.
> To solve this problem, the OP must make sure all cores within the same shard (be they
> on the same node or not) have the same content. This can currently be achieved by:
> a) setting up replication between cores. you index into one core and the other core
> replicates the content
> b) indexing into both cores
> Hope this clarifies.

View raw message