lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Solr Cloud wiki and branch notes
Date Sun, 17 Jan 2010 14:06:00 GMT
On 2010-01-16 21:11, Yonik Seeley wrote:

>> Agreed - but it could be as simple as qualifying this with "from shardX on
>> node2".
>
> Right - it's pretty clear there are both physical and logical
> shards... but it's less clear to me at this point if distinguishing
> them in the vocabulary helps or hurts.

You _are_ distinguishing them, you just use "physical" and "logical" :) 
I'm in favor of using "shard" for the logical entity, and "copy" or 
"replica" for the physical one. Whichever term we choose, we need to be 
clear about this distinction because multiple physical copies (replicas) 
may be deployed to multiple nodes, even though they contribute only one 
logical shard.

>
>> The opaque model means it's more difficult to support updates.
>> IMHO it makes
>> sense to start with a set of stricter assumptions
>
> If we were building from scratch perhaps - but it seems like if we can
> just model what people do today with Solr (but just make it a lot
> easier), that's a good start.  The opaque model is what we have today,
> and it's conceptually simple... the complete collection consists of
> all the unique shard ids (or slices) you know about.

I would argue that the current model has been adopted out of necessity, 
and not because of the users' preference. Unless you want an 
expert-level total control over what node runs what part of the index, 
isn't it much more convenient to delegate all the partitioning and 
deployment to your "search cluster" instead of managing the partitioning 
and deployment yourself? Users have to do it now because Solr has no 
mechanism for this.

>
> And we don't need to support everything in this model - I think we
> should and will also support shards where Solr does all the
> partitioning and mapping of the ID space (pluggable of course) and
> then we can offer more services based on that knowledge.

Well, then if we don't intend to support updates in this iteration then 
perhaps there is no need to change anything in Solr, just extend Katta 
to run Solr searchers ... :P

>
>>> You've also used some slightly new terminology... "shard ID" as
>>> opposed to just shard, which reinforces the need for different
>>> terminology for the physical vs the logical.
>>
>> You got me ;) yes, when I say "shard" I mean the logical entity, as defined
>> by a set of documents - physical shard I would call a replica.
>
> I originally started off with "replica" too... but there may only be
> one copy of a physical shard, it seemed strange to call it a replica.

Yeah .. it's a replica with a replication factor of 1 :)

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Mime
View raw message