lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Goller <>
Subject Re: CompoundFileReader
Date Mon, 20 Oct 2003 15:39:03 GMT
I am not too concerned about the index bound checks. They are an additional
safety and I don´t think they matter in terms of efficiency. The only point
is that I forgot them in my patch and without them the tests fail. So if you
decide to apply my patch, the checks have to be added. If you give your ok,
I can do the commit too.


Dmitry Serebrennikov schrieb:
> I put those in mostly to assure myself that I got things right. I think 
> the key question is whether it possible to read part of another file. If 
> not, I think that's fine. If yes, I think that's a problem.
> Dmitry.
> Christoph Goller wrote:
>> Hi Dmitry,
>> Now I tried all test cases. They all work except for Russian 
>> analyser/stemmer
>> and occational fails of TestIndexReader (the timestamp problem). So I 
>> think
>> it should be ok as far as CoumpoundFile is concerned. Off course we still
>> have to find a good solution for the timestamp problem.
>> However,I stumbled over a problem that I had missed last time. 
>> TestCompoundFile
>> only succedds with your index bound tests in CSInputStream.seekInternal.
>> On Thursday I had deleted them after trying your test cases because 
>> the other
>> implementations don´t do these tests either. I did not go too deep 
>> into your
>> tests, but do you think the bahaviour of throwing an exception if the 
>> seek
>> index is out of bound is required? Its not part of the contract of the 
>> other
>> implementations of InputStream. Maybe I am missing something here.
>> Dmitry Serebrennikov schrieb:
>>> Dear Christoph,
>>> Sounds like an excellent enhancement. From a quick look, it appears 
>>> that you are right and everything should work just fine but use less 
>>> memory. One question: have you tried the other test cases also or 
>>> just the TestCompoundFile. There are quite a few conditions that 
>>> TestCompoundFile does not cover.
>>> At first I thought that the synchronization around readBytes would 
>>> cause too much performance degradation when a lot of concurrent 
>>> queries were executing. But after I looked at it some more, I 
>>> convinced myself that it should be ok. Have you ran any 
>>> multi-threaded tests / benchmarks? I think it might also be a good 
>>> idea before making this change.
>>> Christoph, do you think it is possible to just call readInternal on 
>>> the base stream instead of the readBytes? The main difference is that 
>>> we would bypass the buffering in the base stream. I think the 
>>> buffering done by the superclass of the CSInputStream would be quite 
>>> enough (which is your point to begin with, right)? Perhaps it would 
>>> be worthwhile to make InputStream.readInternal() public instead of 
>>> protected?
>> In CSInputStream.readInternal I call:
>> synchronized (base) {
>> + getFilePointer());
>>   base.readBytes(b, offset, len);
>> }
>> Calling does nothing more than setting the file pointer
>> (bufferStart + bufferPosition) of base correctly.
>> base.readBytes(b, offset, len) in this case does not use the buffer of
>> base (at least in most cases). Look into InputStream.readBytes.
>> If len >= BUFFER_SIZE the base buffer is skipped and the buffer b is
>> used directly.
>> I think synchronized in our case does not much more than synchronizing
>> on the actual file in FSInputStream.readInternal.
>> Christoph
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

* Dr. Christoph Goller     Tel. : +49 89 203 45734          *
* Managing Director        Email: *
* Detego Software GmbH     Mail : Keuslinstr. 13,           *
*                                 80798 München, Germany    *

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message