lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Lucene-based Distributed Index Leveraging Hadoop
Date Thu, 07 Feb 2008 20:33:33 GMT
[ No longer cross-posting to java-dev and solr-user. ]

Andrzej Bialecki wrote:
>> A particular client should be able to provide a consistent read/write 
>> view by bonding to particular replicas of a shard.  Thus a user who 
>> makes a modification should be able to generally see that modification 
>> in results immediately, while other users, talking to different 
>> replicas, may not see it until synchronization is complete.
> 
> This requires that we use versioning, and that we have a "shard manager" 
> that knows the latest versions of each shard among the whole active set 
> - or that clients discover this dynamically by querying the shard 
> servers every now and then.

Yes, there needs to be a master that knows the shard hash function. 
However, I'm not sure what you mean by "versioning".  In general, there 
is no "latest" version of a shard.  Different shards have had different 
updates, and must, between themselves, resolve conflicts.  A client 
would generally talk to just one replica of each shard.  This is like 
CouchDB.  If different fields of a document are modified on different 
shards, then the changes can be merged.  Edits to a text field might 
sometimes even be mergable.  But, in general, if two shards both contain 
unmergable changes to the same field, one will win and one will lose. 
Similarly, if a document id is deleted in one shard and added in another 
at approximately the same time, then the addition would generally win. 
Thus if a single client switches which shard replica it talks to, then 
it could possibly lose deletions.  Or if different clients attempt to 
modify the same document, one clients changes may be overwritten by the 
other.  This is similar to the way that Amazon's Dynamo works: in the 
case of failures, shopping cart deletions can be lost, and deleted 
things may thus re-appear in one's shopping cart.  This happens rarely, 
and confirmation is required before final sale, so it is not a big 
problem.  Perhaps conflicts can be flagged and manually resolved by the 
application.  Or perhaps clocks can be sufficiently synchronized that 
the vast majority of conflicts can be automatically resolved correctly.

Doug

Mime
View raw message