lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "SolrCloud" by YonikSeeley
Date Thu, 03 Dec 2009 20:59:47 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SolrCloud" page has been changed by YonikSeeley.
The comment on this change is: a start on some high level requirements / design.
http://wiki.apache.org/solr/SolrCloud

--------------------------------------------------

New page:
=== High level design goals ===
These are long term goals for SolrCloud.  Many of these features will not be developed in
the first versions, but we're designing for the long haul.

===== High Availability and Fault Tolerance =====
No external load balancer should be required.  We should eliminate any single points of failure
(i.e. start with a design that will allow us to add this feature at a later point with minimal
changes to things like the zookeeper schema)

===== Cluster resizing and rebalancing =====
To grow the cluster or to rebalance due to hotspots, shards should be resizable.  Pieces of
existing shards should be able to be split off and assigned to new servers.

===== Clients should not have to know about cluster layout. =====
A simple client (like a browser) should be able to hit any solr search server in the cluster
with a request, and that search server should be able to execute a distributed search against
the cluster as a whole, including load balancing and failover, to return the results to the
client.

===== Open API =====
The zookeeper schema should be well defined and public, allowing other software components
other than the master node to inspect and change the cluster via zookeeper.  A task like creating
a new collection eventually be achievable by creating the correct znodes/files in zookeeper,
w/o having to talk to any solr servers.

===== Support various levels of custom clusters =====
Support various splits between how much the user manages and how much solr manages.  One could
have a set of servers in zookeeper with defined indexes (shards or complete copies) and want
to just use the client search capabilities.  One may also want a traditional master/slave
relationship, even if more advanced options are available.

===== Support user specified partitioning =====
Partitioning of documents by geographic region, time, user, etc, brings huge performance benefits
by allowing only a part of the cluster to be queried.  This should be an option even in conjunction
with the most advanced modes of operation (automatic document->shard assignment, index
rebalancing, no single points of failure, etc) presumably by allowing user-specified hashcodes
for documents.

=== Entities we want to model or record in zookeeper: ===
'''host'''

 * The actual physical (or virtual, as it may be) machine
 * operating system, RAM, disk
 * some sort of metric for what level of load should be placed on this machine

'''node'''

 * a single JVM running one or more solr cores

'''collection'''

 * A collection of documents sharing the same schema

'''shard'''

 * A piece of a collection

'''core'''

 * A solr core (is there a better name we can use for this?)
 * If a core and a shard have a one-to-one mapping, they could be redundant.

'''role'''

 * is a core a master, a search node, a spell-checker node, etc
 * we probably want a generic way to map from role to different configs or config overrides

'''network topology'''

 * switch, rack, data center, etc

Resources

http://sourceforge.net/mailarchive/forum.php?forum_name=bailey-developers

Mime
View raw message