Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 55934 invoked from network); 18 Oct 2003 19:06:28 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 18 Oct 2003 19:06:28 -0000 Received: (qmail 64012 invoked by uid 500); 18 Oct 2003 19:06:16 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 63982 invoked by uid 500); 18 Oct 2003 19:06:16 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 63965 invoked from network); 18 Oct 2003 19:06:16 -0000 Received: from unknown (HELO host-65-125-35-13.larp.gov) (65.125.35.13) by daedalus.apache.org with SMTP; 18 Oct 2003 19:06:16 -0000 Received: from earthlink.net ([65.174.70.194]) by host-65-125-35-13.larp.gov (8.11.6/8.11.6) with ESMTP id h9IIC9B06572 for ; Sat, 18 Oct 2003 12:12:09 -0600 Message-ID: <3F919091.6060108@earthlink.net> Date: Sat, 18 Oct 2003 13:12:17 -0600 From: Dmitry Serebrennikov User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.3) Gecko/20030312 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Lucene Developers List Subject: Re: CompoundFileReader References: <3F7B8124.80308@earthlink.net> <3F8EEB3B.1030200@detego-software.de> <3F8F26D6.7020506@earthlink.net> <3F91734A.5070800@detego-software.de> In-Reply-To: <3F91734A.5070800@detego-software.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N I put those in mostly to assure myself that I got things right. I think the key question is whether it possible to read part of another file. If not, I think that's fine. If yes, I think that's a problem. Dmitry. Christoph Goller wrote: > Hi Dmitry, > > Now I tried all test cases. They all work except for Russian > analyser/stemmer > and occational fails of TestIndexReader (the timestamp problem). So I > think > it should be ok as far as CoumpoundFile is concerned. Off course we still > have to find a good solution for the timestamp problem. > > However,I stumbled over a problem that I had missed last time. > TestCompoundFile > only succedds with your index bound tests in CSInputStream.seekInternal. > On Thursday I had deleted them after trying your test cases because > the other > implementations don�t do these tests either. I did not go too deep > into your > tests, but do you think the bahaviour of throwing an exception if the > seek > index is out of bound is required? Its not part of the contract of the > other > implementations of InputStream. Maybe I am missing something here. > > Dmitry Serebrennikov schrieb: > >> Dear Christoph, >> >> Sounds like an excellent enhancement. From a quick look, it appears >> that you are right and everything should work just fine but use less >> memory. One question: have you tried the other test cases also or >> just the TestCompoundFile. There are quite a few conditions that >> TestCompoundFile does not cover. >> >> At first I thought that the synchronization around readBytes would >> cause too much performance degradation when a lot of concurrent >> queries were executing. But after I looked at it some more, I >> convinced myself that it should be ok. Have you ran any >> multi-threaded tests / benchmarks? I think it might also be a good >> idea before making this change. >> >> Christoph, do you think it is possible to just call readInternal on >> the base stream instead of the readBytes? The main difference is that >> we would bypass the buffering in the base stream. I think the >> buffering done by the superclass of the CSInputStream would be quite >> enough (which is your point to begin with, right)? Perhaps it would >> be worthwhile to make InputStream.readInternal() public instead of >> protected? > > > In CSInputStream.readInternal I call: > > synchronized (base) { > base.seek(fileOffset + getFilePointer()); > base.readBytes(b, offset, len); > } > > Calling base.seek does nothing more than setting the file pointer > (bufferStart + bufferPosition) of base correctly. > > base.readBytes(b, offset, len) in this case does not use the buffer of > base (at least in most cases). Look into InputStream.readBytes. > If len >= BUFFER_SIZE the base buffer is skipped and the buffer b is > used directly. > > I think synchronized in our case does not much more than synchronizing > on the actual file in FSInputStream.readInternal. > > Christoph > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org