lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Goller <gol...@detego-software.de>
Subject Re: CompoundFileReader
Date Mon, 20 Oct 2003 15:39:03 GMT
I am not too concerned about the index bound checks. They are an additional
safety and I don´t think they matter in terms of efficiency. The only point
is that I forgot them in my patch and without them the tests fail. So if you
decide to apply my patch, the checks have to be added. If you give your ok,
I can do the commit too.

Christoph

Dmitry Serebrennikov schrieb:
> I put those in mostly to assure myself that I got things right. I think 
> the key question is whether it possible to read part of another file. If 
> not, I think that's fine. If yes, I think that's a problem.
> 
> Dmitry.
> 
> Christoph Goller wrote:
> 
>> Hi Dmitry,
>>
>> Now I tried all test cases. They all work except for Russian 
>> analyser/stemmer
>> and occational fails of TestIndexReader (the timestamp problem). So I 
>> think
>> it should be ok as far as CoumpoundFile is concerned. Off course we still
>> have to find a good solution for the timestamp problem.
>>
>> However,I stumbled over a problem that I had missed last time. 
>> TestCompoundFile
>> only succedds with your index bound tests in CSInputStream.seekInternal.
>> On Thursday I had deleted them after trying your test cases because 
>> the other
>> implementations don´t do these tests either. I did not go too deep 
>> into your
>> tests, but do you think the bahaviour of throwing an exception if the 
>> seek
>> index is out of bound is required? Its not part of the contract of the 
>> other
>> implementations of InputStream. Maybe I am missing something here.
>>
>> Dmitry Serebrennikov schrieb:
>>
>>> Dear Christoph,
>>>
>>> Sounds like an excellent enhancement. From a quick look, it appears 
>>> that you are right and everything should work just fine but use less 
>>> memory. One question: have you tried the other test cases also or 
>>> just the TestCompoundFile. There are quite a few conditions that 
>>> TestCompoundFile does not cover.
>>>
>>> At first I thought that the synchronization around readBytes would 
>>> cause too much performance degradation when a lot of concurrent 
>>> queries were executing. But after I looked at it some more, I 
>>> convinced myself that it should be ok. Have you ran any 
>>> multi-threaded tests / benchmarks? I think it might also be a good 
>>> idea before making this change.
>>>
>>> Christoph, do you think it is possible to just call readInternal on 
>>> the base stream instead of the readBytes? The main difference is that 
>>> we would bypass the buffering in the base stream. I think the 
>>> buffering done by the superclass of the CSInputStream would be quite 
>>> enough (which is your point to begin with, right)? Perhaps it would 
>>> be worthwhile to make InputStream.readInternal() public instead of 
>>> protected?
>>
>>
>>
>> In CSInputStream.readInternal I call:
>>
>> synchronized (base) {
>>   base.seek(fileOffset + getFilePointer());
>>   base.readBytes(b, offset, len);
>> }
>>
>> Calling base.seek does nothing more than setting the file pointer
>> (bufferStart + bufferPosition) of base correctly.
>>
>> base.readBytes(b, offset, len) in this case does not use the buffer of
>> base (at least in most cases). Look into InputStream.readBytes.
>> If len >= BUFFER_SIZE the base buffer is skipped and the buffer b is
>> used directly.
>>
>> I think synchronized in our case does not much more than synchronizing
>> on the actual file in FSInputStream.readInternal.
>>
>> Christoph
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>>
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 

-- 
*************************************************************
* Dr. Christoph Goller     Tel. : +49 89 203 45734          *
* Managing Director        Email: goller@detego-software.de *
* Detego Software GmbH     Mail : Keuslinstr. 13,           *
*                                 80798 München, Germany    *
*************************************************************


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message