lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4190) IndexWriter deletes non-Lucene files
Date Mon, 09 Jul 2012 19:07:35 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409739#comment-13409739
] 

Michael McCandless commented on LUCENE-4190:
--------------------------------------------

bq. I think that the way to "bound" the namespace of files is to put everything in a subdirectory
of the index directory chosen by the user and control the name of that subdirectory, making
it clear that this is semi-private to Lucene and that all files in that subdirectory are fair
game.

I think this idea is compelling.  It would clearly succeed in creating
a private namespace (unless someone had happened to separately create
a directory named 'lucene.index' there, which seems very unlikely).

For back compat... if we open an existing index (not in the subdir),
we'd have to just continue writing there?  But when creating a new
index, we'd create the subdir and write files into it (and maybe we
fix index upgrader to somehow move the files to the subdir?).

We could make this change for 4.0 Beta, but it wouldn't be until 6.0
that we could remove the back compat (which is fine).

Hmm but then an old index would never actually migrate "forward".  Not
quite sure how to do the transition ...

It seems like a rather big change, which makes me nervous ... and I
imagine apps will be confused eg when they try to replicate or backup
(but, they'd just have to fix themselves: apps can't rely on the index
files).

It seems less risky to start with the current patch; in fact I think
the two changes can be done separately?

                
> IndexWriter deletes non-Lucene files
> ------------------------------------
>
>                 Key: LUCENE-4190
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4190
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Michael McCandless
>            Assignee: Robert Muir
>             Fix For: 4.0, 5.0
>
>         Attachments: LUCENE-4190.patch, LUCENE-4190.patch, LUCENE-4190.patch, LUCENE-4190.patch
>
>
> Carl Austin raised a good issue in a comment on my Lucene 4.0.0 alpha blog post: http://blog.mikemccandless.com/2012/07/lucene-400-alpha-at-long-last.html
> IndexWriter will now (as of 4.0) delete all foreign files from the index directory. 
We made this change because Codecs are free to write to any files now, so the space of filenames
is hard to "bound".
> But if the user accidentally uses the wrong directory (eg c:/) then we will in fact delete
important stuff.
> I think we can at least use some simple criteria (must start with _, maybe must fit certain
pattern eg _<base36>(_X).Y), so we are much less likely to delete a non-Lucene file....

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message