lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-9055) Make collection backup/restore extensible
Date Tue, 03 May 2016 17:21:13 GMT

    [ https://issues.apache.org/jira/browse/SOLR-9055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15269135#comment-15269135
] 

David Smiley commented on SOLR-9055:
------------------------------------

(p.s. use {{bq.}} to quote)

bq. (me) I have a general question about HDFS; I have no real experience with it: I wonder
if Java's NIO file abstractions could be used so we don't have to have separate code? If so
it would be wonderful – simpler and less code to maintain. See https://github.com/damiencarol/jsr203-hadoop
What do you think?

bq. (Gadre) Although integrating HDFS and Java NIO API sounds interesting, I would prefer
if it is directly provided by HDFS client library as against a third party library which may/may
not be supported in future. Also since Solr provides a HDFS backed Directory implementation,
it probably make sense to reuse it.

Any thoughts on this one [~markrmiller@gmail.com] or [~gchanan] perhaps?

bq. However if we want to keep things simple, we can choose to not provide separate APIs to
configure "repositories". Instead we can just pick the same file-system used to store the
indexed data. That means in case of local file-system, the backup will be stored on shared
file-system using SimpleFSDirectory implementation AND for HDFS we will use HdfsDirectory
impl. Make sense?

I understand what you mean, but it seems a shame, and loses the extensibility we want.  I
think what this comes down to is, should we re-use the Lucene Directory API for moving data
in/out of the backup location, or should we use something else. 

bq. I think the main problem here is identifying type of file-system used for a given collection
at the Overseer (The solr core on the other hand already has a Directory factory reference.
So we can instantiate appropriate directory in the snapshooter).

It was suggested early in SOLR-5750 that the location param should have a protocol/impl scheme
URL prefix (assume {{file://}} if not specified).  That may help the Overseer?  Or if you
mean it needs to know the directory impl of the live indexes well I imagine it could look
this up in the same way that it is done from Solr's admin screen (which shows the impl factory).


I doubt I'll have time to help much more here... I'm a bit behind on my work load.



> Make collection backup/restore extensible
> -----------------------------------------
>
>                 Key: SOLR-9055
>                 URL: https://issues.apache.org/jira/browse/SOLR-9055
>             Project: Solr
>          Issue Type: Task
>            Reporter: Hrishikesh Gadre
>            Assignee: Mark Miller
>         Attachments: SOLR-9055.patch
>
>
> SOLR-5750 implemented backup/restore API for Solr. This JIRA is to track the code cleanup/refactoring.
Specifically following improvements should be made,
> - Add Solr/Lucene version to check the compatibility between the backup version and the
version of Solr on which it is being restored.
> - Add a backup implementation version to check the compatibility between the "restore"
implementation and backup format.
> - Introduce a Strategy interface to define how the Solr index data is backed up (e.g.
using file copy approach).
> - Introduce a Repository interface to define the file-system used to store the backup
data. (currently works only with local file system but can be extended). This should be enhanced
to introduce support for "registering" repositories (e.g. HDFS, S3 etc.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message