lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4975) Add Replication module to Lucene
Date Thu, 02 May 2013 16:08:16 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647655#comment-13647655
] 

Shai Erera commented on LUCENE-4975:
------------------------------------

So here's an overview how the Replicator works (it's also document under oal.replicator.package.html):

At a high-level, producers (e.g. indexer) publish Revisions, and consumers update to the latest
Revision available. Like SVN, if a client is on rev1 and the server has rev4, the next update
request will upgrade the client to rev4, skipping all intermediate revisions.

The Replicator offers two implementations at the moment: LocalReplicator to be used by at
the server side and HttpReplicator to be used by clients to e.g. update over HTTP. In the
future, we may want to add other Replicator implementations, e.g. rsync, torrent... for HTTP,
the package also provides a ReplicationService which acts on the Http servlet request/response
following some API specification. In that sense, the HttpReplicator expects a certain HTTP
impl on the server side, so ReplicationService helps you by implementation that API. The reason
it's not a servlet is so that you can plug it into your application servlet freely.

A Revision is basically a list of files and sources. For example, IndexRevision contains the
list of files in an IndexCommit (and only one source), while IndexAndTaxonomyRevision contains
the list of files from both IndexCommits with corresponding sources (index/taxonomy). When
the server publishes either of these two revision, the IndexCommits are snapshotted so that
files aren't deleted, and the Replicator serves file requests (by clients) from the Revision.
The Revision is also responsible for releasing itself -- this is done automatically by the
Replicator which releases a revision when it's no longer needed (i.e. there's a new one already)
and there are no clients that currently replicate its files.

On the client side, the package offers a ReplicationClient class which can be invoked either
manually, or start its update-thread to periodically check for updates. The client is given
a ReplicationHandler (two matching implementations: IndexReplicationHandler and IndexAndTaxonomyReplicationHandler)
which is responsible to act on the replicated files. The client first obtains all needed files
(i.e. those that the new Revision offers, and the client is still missing), and after they
were all successfully copied over, the handler is invoked. Both handlers copy the files from
their temporary location to the index directories, fsync them and kiss the index such that
unused files are deleted. You can provide each handler a Callable which is invoked after the
index has been safely and successfully updated, so you can e.g. searcherManager.maybeReopen().

Here's a general code example that explains how to work with the Replicator:

{code}
// ++++++++++++++ SERVER SIDE ++++++++++++++ // 
IndexWriter publishWriter; // the writer used for indexing
Replicator replicator = new LocalReplicator();
replicator.publish(new IndexRevision(publishWriter));

// ++++++++++++++ CLIENT SIDE ++++++++++++++ // 
// either LocalReplictor, or HttpReplicator if client and server are on different nodes
Replicator replicator;

// callback invoked after handler finished handling the revision and e.g. can reopen the reader.
Callable&lt;Boolean&gt; callback = null; // can also be null if no callback is needed
ReplicationHandler handler = new IndexReplicationHandler(indexDir, callback);
SourceDirectoryFactory factory = new PerSessionDirectoryFactory(workDir);
ReplicationClient client = new ReplicationClient(replicator, handler, factory);

// invoke client manually
client.updateNow();
	
// or, periodically
client.startUpdateThread(100); // check for update every 100 milliseconds
{code}

The package of course comes with unit tests, though I'm sure there's room for improvement
(there always is!).
                
> Add Replication module to Lucene
> --------------------------------
>
>                 Key: LUCENE-4975
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4975
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>
> I wrote a replication module which I think will be useful to Lucene users who want to
replicate their indexes for e.g high-availability, taking hot backups etc.
> I will upload a patch soon where I'll describe in general how it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message