Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 31070 invoked from network); 18 Oct 2003 17:08:31 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 18 Oct 2003 17:08:31 -0000 Received: (qmail 99520 invoked by uid 500); 18 Oct 2003 17:08:22 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 99487 invoked by uid 500); 18 Oct 2003 17:08:21 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 99474 invoked from network); 18 Oct 2003 17:08:21 -0000 Received: from unknown (HELO smtp03.mrf.mail.rcn.net) (207.172.4.62) by daedalus.apache.org with SMTP; 18 Oct 2003 17:08:21 -0000 Received: from 207-237-23-51.c3-0.avec-ubr11.nyr-avec.ny.cable.rcn.com ([207.237.23.51] helo=localhost) by smtp03.mrf.mail.rcn.net with esmtp (Exim 3.35 #4) id 1AAuYz-0002ZA-00 for lucene-dev@jakarta.apache.org; Sat, 18 Oct 2003 13:08:25 -0400 Received: from formicary.net (localhost [127.0.0.1]) by localhost (Postfix) with ESMTP id E279668165F for ; Sat, 18 Oct 2003 13:07:35 -0400 (EDT) Date: Sat, 18 Oct 2003 13:07:35 -0400 Subject: Re: CompoundFileReader Content-Type: text/plain; charset=ISO-8859-1; format=flowed Mime-Version: 1.0 (Apple Message framework v552) From: Hani Suleiman To: "Lucene Developers List" Content-Transfer-Encoding: quoted-printable In-Reply-To: <3F91734A.5070800@detego-software.de> Message-Id: <91AB4119-018D-11D8-AFF1-000A956D3476@formicary.net> X-Mailer: Apple Mail (2.552) X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N What Russian stemmer problems? I've submitted a fix for that a few=20 weeks ago which was checked in, so there shouldn't be any more problems=20= with it. On Saturday, October 18, 2003, at 01:07 PM, Christoph Goller wrote: > Hi Dmitry, > > Now I tried all test cases. They all work except for Russian=20 > analyser/stemmer > and occational fails of TestIndexReader (the timestamp problem). So I=20= > think > it should be ok as far as CoumpoundFile is concerned. Off course we=20 > still > have to find a good solution for the timestamp problem. > > However,I stumbled over a problem that I had missed last time.=20 > TestCompoundFile > only succedds with your index bound tests in=20 > CSInputStream.seekInternal. > On Thursday I had deleted them after trying your test cases because=20 > the other > implementations don=B4t do these tests either. I did not go too deep=20= > into your > tests, but do you think the bahaviour of throwing an exception if the=20= > seek > index is out of bound is required? Its not part of the contract of the=20= > other > implementations of InputStream. Maybe I am missing something here. > > Dmitry Serebrennikov schrieb: >> Dear Christoph, >> Sounds like an excellent enhancement. =46rom a quick look, it appears=20= >> that you are right and everything should work just fine but use less=20= >> memory. One question: have you tried the other test cases also or=20 >> just the TestCompoundFile. There are quite a few conditions that=20 >> TestCompoundFile does not cover. >> At first I thought that the synchronization around readBytes would=20 >> cause too much performance degradation when a lot of concurrent=20 >> queries were executing. But after I looked at it some more, I=20 >> convinced myself that it should be ok. Have you ran any=20 >> multi-threaded tests / benchmarks? I think it might also be a good=20 >> idea before making this change. >> Christoph, do you think it is possible to just call readInternal on=20= >> the base stream instead of the readBytes? The main difference is that=20= >> we would bypass the buffering in the base stream. I think the=20 >> buffering done by the superclass of the CSInputStream would be quite=20= >> enough (which is your point to begin with, right)? Perhaps it would=20= >> be worthwhile to make InputStream.readInternal() public instead of=20 >> protected? > > In CSInputStream.readInternal I call: > > synchronized (base) { > base.seek(fileOffset + getFilePointer()); > base.readBytes(b, offset, len); > } > > Calling base.seek does nothing more than setting the file pointer > (bufferStart + bufferPosition) of base correctly. > > base.readBytes(b, offset, len) in this case does not use the buffer of > base (at least in most cases). Look into InputStream.readBytes. > If len >=3D BUFFER_SIZE the base buffer is skipped and the buffer b is > used directly. > > I think synchronized in our case does not much more than synchronizing > on the actual file in FSInputStream.readInternal. > > Christoph > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org