lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidimagination.com>
Subject Re: Solr Cloud wiki and branch notes
Date Sat, 16 Jan 2010 20:11:58 GMT
On Sat, Jan 16, 2010 at 2:40 PM, Andrzej Bialecki <ab@getopt.org> wrote:
> I avoided the word "collection", because Solr deploys various cores under
> "collectionX" names, leading users to assume that core == collection.

For distributed search, it's already common to name the cores the same
thing for shards of the same collection on different boxes.  In fact,
we're currently using the core name as a default for the collection
name when bootstrapping.

>> Even the statement "what shard did that response come from" becomes
>> ambiguous since we could be talking a part of the index (ShardX) or we
>> could be talking about the specific physical shard/server (it came
>> from node2).
>
> Agreed - but it could be as simple as qualifying this with "from shardX on
> node2".

Right - it's pretty clear there are both physical and logical
shards... but it's less clear to me at this point if distinguishing
them in the vocabulary helps or hurts.

> The opaque model means it's more difficult to support updates.
> IMHO it makes
> sense to start with a set of stricter assumptions

If we were building from scratch perhaps - but it seems like if we can
just model what people do today with Solr (but just make it a lot
easier), that's a good start.  The opaque model is what we have today,
and it's conceptually simple... the complete collection consists of
all the unique shard ids (or slices) you know about.

And we don't need to support everything in this model - I think we
should and will also support shards where Solr does all the
partitioning and mapping of the ID space (pluggable of course) and
then we can offer more services based on that knowledge.

>> You've also used some slightly new terminology... "shard ID" as
>> opposed to just shard, which reinforces the need for different
>> terminology for the physical vs the logical.
>
> You got me ;) yes, when I say "shard" I mean the logical entity, as defined
> by a set of documents - physical shard I would call a replica.

I originally started off with "replica" too... but there may only be
one copy of a physical shard, it seemed strange to call it a replica.

-Yonik
http://www.lucidimagination.com

Mime
View raw message