lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Why read past EOF
Date Wed, 08 Feb 2012 10:57:16 GMT
Hmm, there's a problem with the logic here (sorry: this is my fault --
my prior suggestion is flat out wrong!).

The problem is... say you commit once, creating commit point 1.  Two
hours later, you commit again, creating commit point 2.  The bug is,
at this point, immediately on committing commit point 2, this deletion
policy will go and remove commit point 1.  Instead, it's supposed to
wait 10 minutes to do so.

So... I think you should go back to using System.currentTimeMillis()
as "the present".  And then, only when the newest commit is more than
10 minutes old, are you allowed to delete the commits before it.  That
should work?

However: you should leave a margin of error, because say the reader
takes 10 seconds to reopen + warm/cutover all search threads... then,
if timing is unlucky, you can still remove a commit point being used
by a reader.  I would leave a comfortable margin, eg if you reopen
readers every 10 minutes, then delete commits older than 15 or 20
minutes.  If commits are rare than leaving a fat margin here will cost
nothing in practice... and if there is some clock change
(System.currentTimeMillis() suddenly jumps, maybe from daylight
savings time, maybe from aggressive clock syncing, whatever), you have
some margin....

Really, a better overall design would be a hard handshake will all
outstanding readers, so that only once every single reader using a
given commit has closed, do you delete the commit.  Then you are
immune clock unreliability.... but this'd require remote communication
in your app to track reader states.

Also, you should remove that dangerous auto-generated-catch-block?  It
may suppress a real exception some day... and onCommit is allowed to
throw IOE.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Feb 7, 2012 at 9:15 PM, superruiye <superruiye@gmail.com> wrote:
> public class PostponeCommitDeletionPolicy implements IndexDeletionPolicy {
>        private final static long deletionPostPone = 600000;
>
>        public void onInit(List<? extends IndexCommit> commits) {
>                // Note that commits.size() should normally be 1:
>                onCommit(commits);
>        }
>
>        /**
>         * delete commits after deletePostPone ms.
>         */
>        public void onCommit(List<? extends IndexCommit> commits) {
>                // Note that commits.size() should normally be 2 (if not
>                // called by onInit above):
>                int size = commits.size();
>                try {
>                        long lastCommitTimestamp = commits.get(commits.size()
-
> 1).getTimestamp();
>                        for (int i = 0; i < size - 1; i++) {
>                                if (lastCommitTimestamp - commits.get(i).getTimestamp()
>
> deletionPostPone) {
>                                        commits.get(i).delete();
>                                }
>                        }
>                } catch (IOException e) {
>                        // TODO Auto-generated catch block
>                        e.printStackTrace();
>                }
>        }
> }
> ----------------------------------
> indexWriterConfig.setIndexDeletionPolicy(new
> PostponeCommitDeletionPolicy());
> ----------------------------------
> and I use a time task(10 minutes) to reopen indexsearcher,but still  read
> past EOF...the trace:
> java.io.IOException: read past EOF
>        at
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:207)
>        at
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
>        at org.apache.lucene.store.DataInput.readInt(DataInput.java:84)
>        at
> org.apache.lucene.store.BufferedIndexInput.readInt(BufferedIndexInput.java:153)
>        at
> org.apache.lucene.index.TermVectorsReader.checkValidFormat(TermVectorsReader.java:197)
>        at
> org.apache.lucene.index.TermVectorsReader.<init>(TermVectorsReader.java:86)
>        at
> org.apache.lucene.index.SegmentCoreReaders.openDocStores(SegmentCoreReaders.java:221)
>        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:117)
>        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:93)
>        at
> org.apache.lucene.index.DirectoryReader.<init>(DirectoryReader.java:113)
>        at
> org.apache.lucene.index.ReadOnlyDirectoryReader.<init>(ReadOnlyDirectoryReader.java:29)
>        at
> org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:81)
>        at
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:754)
>        at
> org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:75)
>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:421)
>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:281)
>        at
> org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:89)
>        at
> com.ableskysearch.migration.timertask.ReopenIndexSearcherTask.runAsPeriod(ReopenIndexSearcherTask.java:40)
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Why-read-past-EOF-tp3639401p3724672.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message