lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hani Suleiman <>
Subject Re: CompoundFileReader
Date Sat, 18 Oct 2003 17:07:35 GMT
What Russian stemmer problems? I've submitted a fix for that a few 
weeks ago which was checked in, so there shouldn't be any more problems 
with it.

On Saturday, October 18, 2003, at 01:07 PM, Christoph Goller wrote:

> Hi Dmitry,
> Now I tried all test cases. They all work except for Russian 
> analyser/stemmer
> and occational fails of TestIndexReader (the timestamp problem). So I 
> think
> it should be ok as far as CoumpoundFile is concerned. Off course we 
> still
> have to find a good solution for the timestamp problem.
> However,I stumbled over a problem that I had missed last time. 
> TestCompoundFile
> only succedds with your index bound tests in 
> CSInputStream.seekInternal.
> On Thursday I had deleted them after trying your test cases because 
> the other
> implementations donĀ“t do these tests either. I did not go too deep 
> into your
> tests, but do you think the bahaviour of throwing an exception if the 
> seek
> index is out of bound is required? Its not part of the contract of the 
> other
> implementations of InputStream. Maybe I am missing something here.
> Dmitry Serebrennikov schrieb:
>> Dear Christoph,
>> Sounds like an excellent enhancement. From a quick look, it appears 
>> that you are right and everything should work just fine but use less 
>> memory. One question: have you tried the other test cases also or 
>> just the TestCompoundFile. There are quite a few conditions that 
>> TestCompoundFile does not cover.
>> At first I thought that the synchronization around readBytes would 
>> cause too much performance degradation when a lot of concurrent 
>> queries were executing. But after I looked at it some more, I 
>> convinced myself that it should be ok. Have you ran any 
>> multi-threaded tests / benchmarks? I think it might also be a good 
>> idea before making this change.
>> Christoph, do you think it is possible to just call readInternal on 
>> the base stream instead of the readBytes? The main difference is that 
>> we would bypass the buffering in the base stream. I think the 
>> buffering done by the superclass of the CSInputStream would be quite 
>> enough (which is your point to begin with, right)? Perhaps it would 
>> be worthwhile to make InputStream.readInternal() public instead of 
>> protected?
> In CSInputStream.readInternal I call:
> synchronized (base) {
> + getFilePointer());
>   base.readBytes(b, offset, len);
> }
> Calling does nothing more than setting the file pointer
> (bufferStart + bufferPosition) of base correctly.
> base.readBytes(b, offset, len) in this case does not use the buffer of
> base (at least in most cases). Look into InputStream.readBytes.
> If len >= BUFFER_SIZE the base buffer is skipped and the buffer b is
> used directly.
> I think synchronized in our case does not much more than synchronizing
> on the actual file in FSInputStream.readInternal.
> Christoph
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message