lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Goller <gol...@detego-software.de>
Subject Re: CompoundFileReader
Date Sat, 18 Oct 2003 17:07:22 GMT
Hi Dmitry,

Now I tried all test cases. They all work except for Russian analyser/stemmer
and occational fails of TestIndexReader (the timestamp problem). So I think
it should be ok as far as CoumpoundFile is concerned. Off course we still
have to find a good solution for the timestamp problem.

However,I stumbled over a problem that I had missed last time. TestCompoundFile
only succedds with your index bound tests in CSInputStream.seekInternal.
On Thursday I had deleted them after trying your test cases because the other
implementations donĀ“t do these tests either. I did not go too deep into your
tests, but do you think the bahaviour of throwing an exception if the seek
index is out of bound is required? Its not part of the contract of the other
implementations of InputStream. Maybe I am missing something here.

Dmitry Serebrennikov schrieb:
> Dear Christoph,
> 
> Sounds like an excellent enhancement. From a quick look, it appears that 
> you are right and everything should work just fine but use less memory. 
> One question: have you tried the other test cases also or just the 
> TestCompoundFile. There are quite a few conditions that TestCompoundFile 
> does not cover.
> 
> At first I thought that the synchronization around readBytes would cause 
> too much performance degradation when a lot of concurrent queries were 
> executing. But after I looked at it some more, I convinced myself that 
> it should be ok. Have you ran any multi-threaded tests / benchmarks? I 
> think it might also be a good idea before making this change.
> 
> Christoph, do you think it is possible to just call readInternal on the 
> base stream instead of the readBytes? The main difference is that we 
> would bypass the buffering in the base stream. I think the buffering 
> done by the superclass of the CSInputStream would be quite enough (which 
> is your point to begin with, right)? Perhaps it would be worthwhile to 
> make InputStream.readInternal() public instead of protected?

In CSInputStream.readInternal I call:

synchronized (base) {
   base.seek(fileOffset + getFilePointer());
   base.readBytes(b, offset, len);
}

Calling base.seek does nothing more than setting the file pointer
(bufferStart + bufferPosition) of base correctly.

base.readBytes(b, offset, len) in this case does not use the buffer of
base (at least in most cases). Look into InputStream.readBytes.
If len >= BUFFER_SIZE the base buffer is skipped and the buffer b is
used directly.

I think synchronized in our case does not much more than synchronizing
on the actual file in FSInputStream.readInternal.

Christoph



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message