lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Serebrennikov <dmit...@earthlink.net>
Subject Re: CompoundFileReader
Date Thu, 16 Oct 2003 23:16:38 GMT
Dear Christoph,

Sounds like an excellent enhancement. From a quick look, it appears that 
you are right and everything should work just fine but use less memory. 
One question: have you tried the other test cases also or just the 
TestCompoundFile. There are quite a few conditions that TestCompoundFile 
does not cover.

At first I thought that the synchronization around readBytes would cause 
too much performance degradation when a lot of concurrent queries were 
executing. But after I looked at it some more, I convinced myself that 
it should be ok. Have you ran any multi-threaded tests / benchmarks? I 
think it might also be a good idea before making this change.

Christoph, do you think it is possible to just call readInternal on the 
base stream instead of the readBytes? The main difference is that we 
would bypass the buffering in the base stream. I think the buffering 
done by the superclass of the CSInputStream would be quite enough (which 
is your point to begin with, right)? Perhaps it would be worthwhile to 
make InputStream.readInternal() public instead of protected?

Thanks.
Dmitry.


Christoph Goller wrote:

> Dear Dmitry,
>
> I finally found time to look into your compound file implementation and
> to try it out. It works and I like it. However, I wondered, why you are
> using clones of the base input stream in 
> CompoundFileReader.CSInputStream.
> The "problem" (actually it is not a real problem) I see is that you are
> double buffering your input. Note that the clone has a buffer of 1024 
> bytes and
> that CompoundFileReader.CSInputStream, which extends InputStream, also 
> has
> such a buffer. I think in the way you implemented it your input gets 
> copied
> into the base buffer and from there into the 
> CompoundFileReader.CSInputStream
> buffer before it is actually used. Please have a look at my patch. At 
> first
> one might think that my implementation is simpler but less efficient. 
> This
> is not the case. Actually it is even a little bit more efficient. 
> Points to
> think about:
>
> *) stream is private to CompoundFileReader, so nobody else can use it, 
> if not
> explicitly granted access. CompoundFileReader uses stream only in its
> constructor and when calling openFile.
>
> *) Therefore, CompoundFileReader.CSInputStream can share the same 
> instance of
> stream if they take care of synchronization and seek.
>
> *) An InputStream not necessarily has to implement seekInternal if 
> readInternal
> takes care of seeking on the real stream (see FSInputStream).
>
> *) Synchronization of CompoundFileReader.CSInputStream.readInternal on 
> base
> does no harm since it is called only when the real file has to be 
> accessed
> and thi is synchronized anyway. However, this synchronizytion is 
> necessary!
>
> I tested my patch with your TestCompoundFile and everything seems fine.
> I also tried it on one of my indices and searched on this index. It also
> works. My impression from my tests is that my patch is a little bit 
> faster
> than your original version but there seems to be not much difference in
> efficiency.
>
> Christoph
>
>------------------------------------------------------------------------
>
>Index: CompoundFileReader.java
>===================================================================
>RCS file: /home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/CompoundFileReader.java,v
>retrieving revision 1.2
>diff -u -r1.2 CompoundFileReader.java
>--- CompoundFileReader.java	13 Oct 2003 14:18:04 -0000	1.2
>+++ CompoundFileReader.java	16 Oct 2003 18:58:02 -0000
>@@ -240,10 +240,9 @@
>                       final long length)
>         throws IOException
>         {
>-            this.base = (InputStream) base.clone();
>+            this.base = base;
>             this.fileOffset = fileOffset;
>             this.length = length;   // variable in the superclass
>-            seekInternal(0);        // position to the adjusted 0th byte
>         }
> 
>         /** Expert: implements buffer refill.  Reads bytes from the current
>@@ -255,7 +254,10 @@
>         protected void readInternal(byte[] b, int offset, int len)
>         throws IOException
>         {
>-            base.readBytes(b, offset, len);
>+            synchronized (base) {
>+              base.seek(fileOffset + getFilePointer());
>+              base.readBytes(b, offset, len);
>+            }
>         }
> 
>         /** Expert: implements seek.  Sets current position in this file, where
>@@ -263,35 +265,11 @@
>          * @see #readInternal(byte[],int,int)
>          */
>         protected void seekInternal(long pos) throws IOException
>-        {
>-            if (pos > 0 && pos >= length)
>-                throw new IOException("Seek past the end of file");
>-
>-            if (pos < 0)
>-                throw new IOException("Seek to a negative offset");
>-
>-            base.seek(fileOffset + pos);
>-        }
>+        {}
> 
>         /** Closes the stream to futher operations. */
>         public void close() throws IOException
>-        {
>-            base.close();
>-        }
>+        {}
> 
>-        /** Returns a clone of this stream.
>-         *
>-         * <p>Clones of a stream access the same data, and are positioned at the
same
>-         * point as the stream they were cloned from.
>-         *
>-         * <p>Expert: Subclasses must ensure that clones may be positioned at
>-         * different points in the input from each other and from the stream they
>-         * were cloned from.
>-         */
>-        public Object clone() {
>-            CSInputStream other = (CSInputStream) super.clone();
>-            other.base = (InputStream) base.clone();
>-            return other;
>-        }
>     }
> }
>
>  
>
>------------------------------------------------------------------------
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message