lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Serebrennikov <dmit...@earthlink.net>
Subject Re: CompoundFileReader
Date Sat, 18 Oct 2003 19:12:17 GMT
I put those in mostly to assure myself that I got things right. I think 
the key question is whether it possible to read part of another file. If 
not, I think that's fine. If yes, I think that's a problem.

Dmitry.

Christoph Goller wrote:

> Hi Dmitry,
>
> Now I tried all test cases. They all work except for Russian 
> analyser/stemmer
> and occational fails of TestIndexReader (the timestamp problem). So I 
> think
> it should be ok as far as CoumpoundFile is concerned. Off course we still
> have to find a good solution for the timestamp problem.
>
> However,I stumbled over a problem that I had missed last time. 
> TestCompoundFile
> only succedds with your index bound tests in CSInputStream.seekInternal.
> On Thursday I had deleted them after trying your test cases because 
> the other
> implementations donĀ“t do these tests either. I did not go too deep 
> into your
> tests, but do you think the bahaviour of throwing an exception if the 
> seek
> index is out of bound is required? Its not part of the contract of the 
> other
> implementations of InputStream. Maybe I am missing something here.
>
> Dmitry Serebrennikov schrieb:
>
>> Dear Christoph,
>>
>> Sounds like an excellent enhancement. From a quick look, it appears 
>> that you are right and everything should work just fine but use less 
>> memory. One question: have you tried the other test cases also or 
>> just the TestCompoundFile. There are quite a few conditions that 
>> TestCompoundFile does not cover.
>>
>> At first I thought that the synchronization around readBytes would 
>> cause too much performance degradation when a lot of concurrent 
>> queries were executing. But after I looked at it some more, I 
>> convinced myself that it should be ok. Have you ran any 
>> multi-threaded tests / benchmarks? I think it might also be a good 
>> idea before making this change.
>>
>> Christoph, do you think it is possible to just call readInternal on 
>> the base stream instead of the readBytes? The main difference is that 
>> we would bypass the buffering in the base stream. I think the 
>> buffering done by the superclass of the CSInputStream would be quite 
>> enough (which is your point to begin with, right)? Perhaps it would 
>> be worthwhile to make InputStream.readInternal() public instead of 
>> protected?
>
>
> In CSInputStream.readInternal I call:
>
> synchronized (base) {
>   base.seek(fileOffset + getFilePointer());
>   base.readBytes(b, offset, len);
> }
>
> Calling base.seek does nothing more than setting the file pointer
> (bufferStart + bufferPosition) of base correctly.
>
> base.readBytes(b, offset, len) in this case does not use the buffer of
> base (at least in most cases). Look into InputStream.readBytes.
> If len >= BUFFER_SIZE the base buffer is skipped and the buffer b is
> used directly.
>
> I think synchronized in our case does not much more than synchronizing
> on the actual file in FSInputStream.readInternal.
>
> Christoph
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message