Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@apache.org Received: (qmail 18928 invoked from network); 26 Apr 2002 16:02:57 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 26 Apr 2002 16:02:57 -0000 Received: (qmail 4099 invoked by uid 97); 26 Apr 2002 16:02:59 -0000 Delivered-To: qmlist-jakarta-archive-lucene-dev@jakarta.apache.org Received: (qmail 4063 invoked by uid 97); 26 Apr 2002 16:02:57 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 4031 invoked from network); 26 Apr 2002 16:02:56 -0000 Message-ID: <3CC97B07.9010005@earthlink.net> Date: Fri, 26 Apr 2002 10:06:31 -0600 From: Dmitry Serebrennikov User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.7) Gecko/20011221 X-Accept-Language: en-us MIME-Version: 1.0 To: Lucene Developers List Subject: Re: InputStream handling problem Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Roman Rokytskyy wrote: >>I'm sorry, I should have been more specific. The file handle is only in >>the picture when FSInputStream is cloned. From what I can tell after a >>quick look, InputStream is responsible for buffering and it delegates to >>subclasses (via a call to readInternal) to refill the buffer from the >>underlying data store. When cloned, the InputStream clones the buffer >>(in the hope that the next read will still hit the buffered data I >>suppose), but after that it has its own seek position and its own >>buffer. In the case of FSInputStream, there is a Descriptor object that >>is shared between the clones. In the case of RAMInputStream - RAMFile is >>the shared object. >> > >What is the reason to have buffer with RAMInputStream? To have another copy >of same data? > Good point. Just goes to show that I shouldn't try to be an authority on the topic without a more detailed look at the whole picture. > > >>Perhaps a factory patter would be more flexible, but it looks like the >>existing code does a pretty good job for the RAM and FS cases. Would the >>factory pattern allow a better database implementation? >> > >It might. If you use embedded database like JDataStore, you should not cache >data internally, database does this. So, buffer and cache simply introduce >addtional memory consumption. > >>I don't know, I have not heard many complaints about that code recently. >> > >Ok, I will try it "as is" with JDataStore, and if it works - fine. > >>There is activity in terms of creating a crawler / content handler >>framework. There is also a need to handle "update" better, I think. For >>example, I think it would be great to have deletes go through >>IndexWriter and get "cached" in the new segment, to be later applied to >>the prior segments during optimization. This would make deletes and adds >>transactional. >> > >Ok, I will have a look, but I have almost no experience with Lucene. > >>Another thing on my wish / todo list is to reduce the number of OS files >>that must be open. Once you get a lot of indexes, with a number of >>stored fields, and keep re-indexing them, the number of open files grows >>rather quickly. And if Lucene is part of another program that already >>has other file IO needs, you end up quickly pushing into the max open >>files limit of the OS. The idea I have for this one is to implement a >>different kind of segment - one that is composed of a single file. Once >>a segment is created by IndexWriter, it never changes (besides the >>deletes), so it could easily be stored as a single file. >> > >I will check this thing with JDataStore. Maybe we could borrow couple of >ideas from them (like built-in file system)... This would simplify life - >one file for all indices, tx support?, backup, etc. > This JDataStore, I assume it is proprietary by Borland? The source isn't available is it? Probably many of the problems they address won't exist in Lucene if we only use this for finished segments, since they will be read-only. I think there are a lot of issues related to fragmentation and growth of files that a filesystem has to address if it supports writing. Dmitry. -- To unsubscribe, e-mail: For additional commands, e-mail: