Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-dev@lucene.apache.org
Message-ID: <15350093.1173987070017.JavaMail.jira@brutus>
Date: Thu, 15 Mar 2007 12:31:10 -0700 (PDT)
From: "Doron Cohen (JIRA)" <jira@apache.org>
To: java-dev@lucene.apache.org
Subject: [jira] Commented: (LUCENE-710) Implement "point in time" searching
 without relying on filesystem semantics
In-Reply-To: <21389840.1163533599114.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/LUCENE-710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12481290 ] 

Doron Cohen commented on LUCENE-710:
------------------------------------

> If for a given application the developer is concerned 
> about safety (losing index due to crash), then the 
> normal default policy should be used. 

Spooky... what if onCommit() also deletes all commits?  
(Might this be a pit for users to fall in...?)

I added warnings about this in IndexDeletionPolicy methods.

Just commiited review.take2 + these comments.


> Implement "point in time" searching without relying on filesystem semantics
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-710
>                 URL: https://issues.apache.org/jira/browse/LUCENE-710
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Michael McCandless
>         Assigned To: Michael McCandless
>            Priority: Minor
>         Attachments: 710.review.diff, 710.review.take2.diff, LUCENE-710.patch, LUCENE-710.take2.patch, LUCENE-710.take3.patch
>
>
> This was touched on in recent discussion on dev list:
>   http://www.gossamer-threads.com/lists/lucene/java-dev/41700#41700
> and then more recently on the user list:
>   http://www.gossamer-threads.com/lists/lucene/java-user/42088
> Lucene's "point in time" searching currently relies on how the
> underlying storage handles deletion files that are held open for
> reading.
> This is highly variable across filesystems.  For example, UNIX-like
> filesystems usually do "close on last delete", and Windows filesystem
> typically refuses to delete a file open for reading (so Lucene retries
> later).  But NFS just removes the file out from under the reader, and
> for that reason "point in time" searching doesn't work on NFS
> (see LUCENE-673 ).
> With the lockless commits changes (LUCENE-701 ), it's quite simple to
> re-implement "point in time searching" so as to not rely on filesystem
> semantics: we can just keep more than the last segments_N file (as
> well as all files they reference).
> This is also in keeping with the design goal of "rely on as little as
> possible from the filesystem".  EG with lockless we no longer re-use
> filenames (don't rely on filesystem cache being coherent) and we no
> longer use file renaming (because on Windows it can fails).  This
> would be another step of not relying on semantics of "deleting open
> files".  The less we require from filesystem the more portable Lucene
> will be!
> Where it gets interesting is what "policy" we would then use for
> removing segments_N files.  The policy now is "remove all but the last
> one".  I think we would keep this policy as the default.  Then you
> could imagine other policies:
>   * Keep past N day's worth
>   * Keep the last N
>   * Keep only those in active use by a reader somewhere (note: tricky
>     how to reliably figure this out when readers have crashed, etc.)
>   * Keep those "marked" as rollback points by some transaction, or
>     marked explicitly as a "snaphshot".
>   * Or, roll your own: the "policy" would be an interface or abstract
>     class and you could make your own implementation.
> I think for this issue we could just create the framework
> (interface/abstract class for "policy" and invoke it from
> IndexFileDeleter) and then implement the current policy (delete all
> but most recent segments_N) as the default policy.
> In separate issue(s) we could then create the above more interesting
> policies.
> I think there are some important advantages to doing this:
>   * "Point in time" searching would work on NFS (it doesn't now
>     because NFS doesn't do "delete on last close"; see LUCENE-673 )
>     and any other Directory implementations that don't work
>     currently.
>   * Transactional semantics become a possibility: you can set a
>     snapshot, do a bunch of stuff to your index, and then rollback to
>     the snapshot at a later time.
>   * If a reader crashes or machine gets rebooted, etc, it could choose
>     to re-open the snapshot it had previously been using, whereas now
>     the reader must always switch to the last commit point.
>   * Searchers could search the same snapshot for follow-on actions.
>     Meaning, user does search, then next page, drill down (Solr),
>     drill up, etc.  These are each separate trips to the server and if
>     searcher has been re-opened, user can get inconsistent results (=
>     lost trust).  But with, one series of search interactions could
>     explicitly stay on the snapshot it had started with.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org