lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: File timestamps / subclassing of IndexReader
Date Wed, 19 Nov 2003 20:03:34 GMT
Hello Christoph,

As far as I can tell, this patch looks good.
I have a question, though.  You changed the constructor to
SegmentReader to take both SegmentInfos and a SegmentInfo.
Is there a need to pass both?
Also, in SegmentReader you do:

-            directory().touchFile("segments");
+            if(segmentInfos != null)
+              segmentInfos.write(directory());

Can segmentInfos ever be == null at this point?

I don't have the source code here to check what SegmentInfos'
write(...) method does, so I can't tell why it was added here.

I also spotted a few instances of a typo in the Javadoc: "segmets"
instead of "segments".

Looks good, though.  Oh, I don't know enough about FilterIndexReader,
so I can't make a call about that segmentInfos, but I Doug's comment
about the 'protected' access modifier sounds valid to me.

Otis


--- Christoph Goller <goller@detego-software.de> wrote:
> Sorry for the delay. I was busy with other things and it took some
> time to fix the problems that Doug mentioned.
> 
> Furthermore, thanks for the links concerning dublicate checking.
> 
> I would like to commit my timestamp patch in the next few days.
> So my question goes to  the mailing list whether there are any
> objections.
> 
> Doug Cutting schrieb:
> > I like this approach, but I have two concerns:
> > 
> > 1. For back-compatibility, the version number should be written at
> the 
> > end of the "segments" file, not at its start.  This may make things
> a 
> > big slower, but we should not incompatibly change the file format: 
> > existing indexes should still open correctly.
> 
> done.
> I also made some tests. Reading and old index and changing it works
> now.
> 
> > 
> > 2. There should be public IndexReader.getVersion methods,
> correspodning 
> > to the existing IndexReader.getLastModified methods.  The
> lastModified 
> > methods should be deprecated, with a comment indicating that the 
> > getVersion methods should be used instead.
> > 
> 
> done
> 
> please have a look at the latest version of this patch in this mail
> (again to be applied to index directory).
> 
> Some thoughts about FilterIndexReader:
> I noticed that all synchronization for a FilterIndexReader is done
> by the FilterIndexReader itself, more precisely by its superclass
> IndexReader and its methods delete and close. The synchronization
> is not done by FilterIndexReader's member in. Since I introduced
> a SegmentInfos member in IndexReader for synchronization purposes
> it is needed there also for the superclass of a FilterIndexReader.
> I changed the constructor of FilterIndexReader to achieve that.
> 
> 
> Christoph
> 
> 
> > Index: FilterIndexReader.java
> ===================================================================
> RCS file:
>
/home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/FilterIndexReader.java,v
> retrieving revision 1.3
> diff -u -r1.3 FilterIndexReader.java
> --- FilterIndexReader.java	6 Nov 2003 15:30:57 -0000	1.3
> +++ FilterIndexReader.java	19 Nov 2003 14:23:36 -0000
> @@ -114,6 +114,7 @@
>  
>    public FilterIndexReader(IndexReader in) {
>      super(in.directory());
> +    segmentInfos = in.segmentInfos;
>      this.in = in;
>    }
>  
> Index: IndexReader.java
> ===================================================================
> RCS file:
>
/home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/IndexReader.java,v
> retrieving revision 1.22
> diff -u -r1.22 IndexReader.java
> --- IndexReader.java	21 Oct 2003 18:24:23 -0000	1.22
> +++ IndexReader.java	19 Nov 2003 14:23:36 -0000
> @@ -83,14 +83,14 @@
>  public abstract class IndexReader {
>    protected IndexReader(Directory directory) {
>      this.directory = directory;
> -    segmentInfosAge = Long.MAX_VALUE;
> +    stale = false;
> +    segmentInfos = null;
>    }
>  
>    private Directory directory;
>    private Lock writeLock;
> -
> -  //used to determine whether index has chaged since reader was
> opened
> -  private long segmentInfosAge;
> +  protected SegmentInfos segmentInfos;
> +  private boolean stale;
>    
>    /** Returns an IndexReader reading the index in an FSDirectory in
> the named
>    path. */
> @@ -111,21 +111,16 @@
>            directory.makeLock(IndexWriter.COMMIT_LOCK_NAME),
>            IndexWriter.COMMIT_LOCK_TIMEOUT) {
>            public Object doBody() throws IOException {
> -            IndexReader result = null;
> -            
>              SegmentInfos infos = new SegmentInfos();
>              infos.read(directory);
>              if (infos.size() == 1) {		  // index is optimized
> -                result = new SegmentReader(infos.info(0), true);
> +              return new SegmentReader(infos, infos.info(0), true);
>              } else {
>                  SegmentReader[] readers = new
> SegmentReader[infos.size()];
>                  for (int i = 0; i < infos.size(); i++)
> -                  readers[i] = new SegmentReader(infos.info(i),
> i==infos.size()-1);
> -                result =  new SegmentsReader(directory, readers);
> +                  readers[i] = new SegmentReader(infos,
> infos.info(i), i==infos.size()-1);
> +                return new SegmentsReader(infos, directory,
> readers);
>              }
> -        
> -            result.segmentInfosAge = lastModified(directory);
> -            return result;
>            }
>          }.run();
>      }
> @@ -134,20 +129,89 @@
>    /** Returns the directory this index resides in. */
>    public Directory directory() { return directory; }
>  
> -  /** Returns the time the index in the named directory was last
> modified. */
> +  /** 
> +   * Returns the time the index in the named directory was last
> modified. 
> +   * 
> +   * <p>Synchronization of IndexReader and IndexWriter instances is 
> +   * no longer done via time stamps of the segmets file since the
> time resolution 
> +   * depends on the hardware platform. Instead, a version number is
> maintained
> +   * within the segments file, which is incremented everytime when
> the index is
> +   * changed.</p>
> +   * 
> +   * @deprecated  Replaced by {@link #getCurrentVersion(String)}
> +   * */
>    public static long lastModified(String directory) throws
> IOException {
>      return lastModified(new File(directory));
>    }
>  
> -  /** Returns the time the index in the named directory was last
> modified. */
> +  /** 
> +   * Returns the time the index in the named directory was last
> modified. 
> +   * 
> +   * <p>Synchronization of IndexReader and IndexWriter instances is 
> +   * no longer done via time stamps of the segmets file since the
> time resolution 
> +   * depends on the hardware platform. Instead, a version number is
> maintained
> +   * within the segments file, which is incremented everytime when
> the index is
> +   * changed.</p>
> +   * 
> +   * @deprecated  Replaced by {@link #getCurrentVersion(File)}
> +   * */
>    public static long lastModified(File directory) throws IOException
> {
>      return FSDirectory.fileModified(directory, "segments");
>    }
>  
> -  /** Returns the time the index in this directory was last
> modified. */
> +  /** 
> +   * Returns the time the index in the named directory was last
> modified. 
> +   * 
> +   * <p>Synchronization of IndexReader and IndexWriter instances is 
> +   * no longer done via time stamps of the segmets file since the
> time resolution 
> +   * depends on the hardware platform. Instead, a version number is
> maintained
> +   * within the segments file, which is incremented everytime when
> the index is
> +   * changed.</p>
> +   * 
> +   * @deprecated  Replaced by {@link #getCurrentVersion(Directory)}
> +   * */
>    public static long lastModified(Directory directory) throws
> IOException {
>      return directory.fileModified("segments");
>    }
> +  
> +  /**
> +   * Reads version number from segments files. The version number
> counts the
> +   * number of changes of the index.
> +   * 
> +   * @param directory where the index resides.
> +   * @return version number.
> +   * @throws IOException if segments file cannot be read
> +   */
> +  public static long getCurrentVersion(String directory) throws
> IOException {
> +    return getCurrentVersion(new File(directory));
> +  }
> +  
> +  /**
> +   * Reads version number from segments files. The version number
> counts the
> +   * number of changes of the index.
> +   * 
> +   * @param directory where the index resides.
> +   * @return version number.
> +   * @throws IOException if segments file cannot be read
> +   */
> +  public static long getCurrentVersion(File directory) throws
> IOException {
> +    Directory dir = FSDirectory.getDirectory(directory, false);
> +    long version = getCurrentVersion(dir);
> +    dir.close();
> +    return version;
> +  }
> +  
> +  /**
> +   * Reads version number from segments files. The version number
> counts the
> +   * number of changes of the index.
> +   * 
> +   * @param directory where the index resides.
> +   * @return version number.
> +   * @throws IOException if segments file cannot be read.
> +   */
> +  public static long getCurrentVersion(Directory directory) throws
> IOException {
> +    return SegmentInfos.readCurrentVersion(directory);
> +  }
>  
>    /**
>     * Returns <code>true</code> if an index exists at the specified
> directory.
> @@ -274,6 +338,9 @@
>      this will be corrected eventually as the index is further
> modified.
>    */
>    public final synchronized void delete(int docNum) throws
> IOException {
> +    if(stale)
> +      throw new IOException("IndexReader out of date and no longer
> valid for deletion");
> +      
>      if (writeLock == null) {
>        Lock writeLock =
> directory.makeLock(IndexWriter.WRITE_LOCK_NAME);
>        if (!writeLock.obtain(IndexWriter.WRITE_LOCK_TIMEOUT)) //
> obtain write lock
> @@ -282,11 +349,11 @@
>  
>        // we have to check whether index has changed since this
> reader was opened.
>        // if so, this reader is no longer valid for deletion
> -      if(lastModified(directory) > segmentInfosAge){
> +      if(segmentInfos != null  &&
> SegmentInfos.readCurrentVersion(directory) >
> segmentInfos.getVersion()){
> +          stale = true;
>            this.writeLock.release();
>            this.writeLock = null;
> -          throw new IOException(
> -            "IndexReader out of date and no longer valid for
> deletion");
> +          throw new IOException("IndexReader out of date and no
> longer valid for deletion");
>        }
>      }
>      doDelete(docNum);
> Index: SegmentInfos.java
> ===================================================================
> RCS file:
>
/home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/SegmentInfos.java,v
> retrieving revision 1.2
> diff -u -r1.2 SegmentInfos.java
> --- SegmentInfos.java	18 Nov 2003 11:35:57 -0000	1.2
> +++ SegmentInfos.java	19 Nov 2003 14:23:36 -0000
> @@ -61,22 +61,28 @@
>  import org.apache.lucene.store.OutputStream;
>  
>  final class SegmentInfos extends Vector {
> -  public int counter = 0;			  // used to name new segments
> +  public int counter = 0;    // used to name new segments
> +  private long version = 0; //counts how often the index has been
> changed by adding or deleting docs
>  
>    public final SegmentInfo info(int i) {
> -    return (SegmentInfo)elementAt(i);
> +    return (SegmentInfo) elementAt(i);
>    }
>  
>    public final void read(Directory directory) throws IOException {
>      InputStream input = directory.openFile("segments");
>      try {
> -      counter = input.readInt();		          // read counter
> +      counter = input.readInt(); // read counter
>        for (int i = input.readInt(); i > 0; i--) { // read
> segmentInfos
> -        SegmentInfo si = new SegmentInfo(input.readString(),
> input.readInt(),
> -          directory);
> +        SegmentInfo si =
> +          new SegmentInfo(input.readString(), input.readInt(),
> directory);
>          addElement(si);
>        }
> -    } finally {
> +      if (input.getFilePointer() >= input.length())
> +        version = 0; // old file format without version number
> +      else
> +        version = input.readLong(); // read version
> +    }
> +    finally {
>        input.close();
>      }
>    }
> @@ -84,18 +90,41 @@
>    public final void write(Directory directory) throws IOException {
>      OutputStream output = directory.createFile("segments.new");
>      try {
> -      output.writeInt(counter);			  // write counter
> -      output.writeInt(size());			  // write infos
> +      output.writeInt(counter); // write counter
> +      output.writeInt(size()); // write infos
>        for (int i = 0; i < size(); i++) {
>          SegmentInfo si = info(i);
>          output.writeString(si.name);
>          output.writeInt(si.docCount);
>        }
> -    } finally {
> +      output.writeLong(++version); // every write changes the index 
>        
> +    }
> +    finally {
>        output.close();
>      }
>  
>      // install new segment info
>      directory.renameFile("segments.new", "segments");
> +  }
> +
> +  /**
> +   * version number when this SegmentInfos was generated.
> +   */
> +  public long getVersion() {
> +    return version;
> +  }
> +
> +  /**
> +   * Current version number from segments file.
> +   */
> +  public static long readCurrentVersion(Directory directory)
> +    throws IOException {
> +
> +    // We cannot be sure whether the segments file is in the old
> format or the new one.
> +    // Therefore we have to read the whole file and cannot simple
> seek to the version entry.
> +
> +    SegmentInfos sis = new SegmentInfos();
> +    sis.read(directory);
> +    return sis.getVersion();
>    }
>  }
> Index: SegmentReader.java
> ===================================================================
> RCS file:
>
/home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/SegmentReader.java,v
> retrieving revision 1.15
> diff -u -r1.15 SegmentReader.java
> --- SegmentReader.java	21 Oct 2003 18:24:23 -0000	1.15
> +++ SegmentReader.java	19 Nov 2003 14:23:37 -0000
> @@ -98,10 +98,11 @@
>    }
>    private Hashtable norms = new Hashtable();
>  
> -  SegmentReader(SegmentInfo si, boolean closeDir)
> +  SegmentReader(SegmentInfos sis, SegmentInfo si, boolean closeDir)
>      throws IOException {
>      this(si);
>      closeDirectory = closeDir;
> +    segmentInfos = sis;
>    }
>  
>    SegmentReader(SegmentInfo si)
> @@ -141,7 +142,10 @@
>            public Object doBody() throws IOException {
>              deletedDocs.write(directory(), segment + ".tmp");
>              directory().renameFile(segment + ".tmp", segment +
> ".del");
> -            directory().touchFile("segments");
> +            if(segmentInfos != null)
> +              segmentInfos.write(directory());
> +            else
> +              directory().touchFile("segments");
>              return null;
>            }
>          }.run();
> Index: SegmentsReader.java
> ===================================================================
> RCS file:
>
/home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/SegmentsReader.java,v
> retrieving revision 1.15
> diff -u -r1.15 SegmentsReader.java
> --- SegmentsReader.java	31 Oct 2003 09:46:54 -0000	1.15
> +++ SegmentsReader.java	19 Nov 2003 14:23:37 -0000
> @@ -78,6 +78,11 @@
>    private int numDocs = -1;
>    private boolean hasDeletions = false;
>  
> +  SegmentsReader(SegmentInfos sis, Directory directory,
> SegmentReader[] r) throws IOException {
> +      this(directory, r);
> +      segmentInfos = sis;
> +  }
> +  
>    SegmentsReader(Directory directory, SegmentReader[] r) throws
> IOException {
>      super(directory);
>      readers = r;
> 
> >
---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message