lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan <>
Subject [lucy-dev] Re: [lucy-user] ClusterSearcher
Date Sat, 12 Nov 2011 22:31:42 GMT
>> For Lucy as a whole, I think there are some meta-questions that should
>> be resolved before we go down this path.
>> 1) How core is is this to Lucy's functionality?
Support feels *very* core.. Lucy designed without great cluster
search/write support seems broken when scaled, hacked at best.

Building a cluster search implementation in the core Lucy project? no idea.

>> 2) How much should we depend on outside libraries?

 Core Lucy:  I hope as little as possible.
 Cluster Search: Best tools for job.

>> 3) How independent should the Searcher and the Clients be?
 What do you mean by this? I keep getting tripped up on with a
clustered search there are 2 servers and 2 clients
 and it's easy to be talking about the wrong ones (for me).

    (client #1)                   (server #1 and client #2)
(server #2)
 python/perl user --->     Cluster request receiver             ---> nodes 1-N

even if the "Cluster request receiver" is just a normal node/master
search collator/etc
you still need  a way to take requests and ask your peers for data you
don't have.
I'm sure someone has good names.

>> 4) How future-proof and scalable do we want this solution to be?

Current things rattling around in my head as we have been talking
about the long term cluster support.

Search or Write Optimized?
   I *think* most people would agree we should lean towards search optimized.
   But real-time/fast reopens is a big feature we have over others
currently and
   I would not want to lose it (or the perception of real-time).
   I also do not want to lose fast bulk adds.

What Type of Cluster?
   a. Muti-Master?
   b. Master-Slave(s)?
   c. Sharded-Master-Slave(s)?

When can a searcher see the new data?
 a. intermittently as it's replicated to all nodes?
 b. only after all nodes have a copy?
 c. Instantly for the client that added.. but (a) from everyone else?

Document Versioning.
 a. Last guy to write wins (deletes become interesting!)?
 b. Vector clocks with client resolution?
 c. currently Lucy docs have no "primary key"
     this feels like it would need to change *if* versioning is
required for clustering
 d. not needed at all?

Delete by Query.
 a. Seems a little tricky to me in cluster search with replication
with possible out of order execution.
      I'm sure something is doable just something to think about.

Replicating Data.
 a. Index copies?
 b. Segment copies?
 c. Doc copies?

Server failover properties.
 a. Auto-Rebalancing (shard/segment/index)?
 b. Always writeable model (muti-master)?
 c. Slave auto-promote?

Schema Changes:
 a. Will every node have to be updated simultaneously to prevent
search/write fails?
      I'm talking about things like  adding a new field not changing a field.


View raw message