lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-4731) New ReplicatingDirectory mirrors index files to HDFS
Date Sun, 16 Mar 2014 04:47:31 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

David Smiley updated LUCENE-4731:
---------------------------------

    Fix Version/s:     (was: 4.7)
                   4.8

> New ReplicatingDirectory mirrors index files to HDFS
> ----------------------------------------------------
>
>                 Key: LUCENE-4731
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4731
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/store
>            Reporter: David Arthur
>             Fix For: 4.8
>
>         Attachments: ReplicatingDirectory.java
>
>
> I've been working on a Directory implementation that mirrors the index files to HDFS
(or other Hadoop supported FileSystem).
> A ReplicatingDirectory delegates all calls to an underlying Directory (supplied in the
constructor). The only hooks are the deleteFile and sync calls. We submit deletes and replications
to a single scheduler thread to keep things serializer. During a sync call, if "segments.gen"
is seen in the list of files, we know a commit is finishing. After calling the deletage's
sync method, we initialize an asynchronous replication as follows.
> * Read segments.gen (before leaving ReplicatingDirectory#sync), save the values for later
> * Get a list of local files from ReplicatingDirectory#listAll before leaving ReplicatingDirectory#sync
> * Submit replication task (DirectoryReplicator) to scheduler thread
> * Compare local files to remote files, determine which remote files get deleted, and
which need to get copied
> * Submit a thread to copy each file (one thead per file)
> * Submit a thread to delete each file (one thead per file)
> * Submit a "finalizer" thread. This thread waits on the previous two batches of threads
to finish. Once finished, this thread generates a new "segments.gen" remotely (using the version
and generation number previously read in).
> I have no idea where this would belong in the Lucene project, so i'll just attach the
standalone class instead of a patch. It introduces dependencies on Hadoop core (and all the
deps that brings with it).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message