Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Message-ID: <3CC97B07.9010005@earthlink.net>
Date: Fri, 26 Apr 2002 10:06:31 -0600
From: Dmitry Serebrennikov <dmitrys@earthlink.net>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
 rv:0.9.7) Gecko/20011221
MIME-Version: 1.0
To: Lucene Developers List <lucene-dev@jakarta.apache.org>
Subject: Re: InputStream handling problem
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Roman Rokytskyy wrote:

>>I'm sorry, I should have been more specific. The file handle is only in
>>the picture when FSInputStream is cloned. From what I can tell after a
>>quick look, InputStream is responsible for buffering and it delegates to
>>subclasses (via a call to readInternal) to refill the buffer from the
>>underlying data store. When cloned, the InputStream clones the buffer
>>(in the hope that the next read will still hit the buffered data I
>>suppose), but after that it has its own seek position and its own
>>buffer. In the case of FSInputStream, there is a Descriptor object that
>>is shared between the clones. In the case of RAMInputStream - RAMFile is
>>the shared object.
>>
>
>What is the reason to have buffer with RAMInputStream? To have another copy
>of same data?
>
Good point. Just goes to show that I shouldn't try to be an authority on 
the topic without a more detailed look at the whole picture.

>
>
>>Perhaps a factory patter would be more flexible, but it looks like the
>>existing code does a pretty good job for the RAM and FS cases. Would the
>>factory pattern allow a better database implementation?
>>
>
>It might. If you use embedded database like JDataStore, you should not cache
>data internally, database does this. So, buffer and cache simply introduce
>addtional memory consumption.
>
>>I don't know, I have not heard many complaints about that code recently.
>>
>
>Ok, I will try it "as is" with JDataStore, and if it works - fine.
>
>>There is activity in terms of creating a crawler / content handler
>>framework. There is also a need to handle "update" better, I think. For
>>example, I think it would be great to have deletes go through
>>IndexWriter and get "cached" in the new segment, to be later applied to
>>the prior segments during optimization. This would make deletes and adds
>>transactional.
>>
>
>Ok, I will have a look, but I have almost no experience with Lucene.
>
>>Another thing on my wish / todo list is to reduce the number of OS files
>>that must be open. Once you get a lot of indexes, with a number of
>>stored fields, and keep re-indexing them, the number of open files grows
>>rather quickly. And if Lucene is part of another program that already
>>has other file IO needs, you end up quickly pushing into the max open
>>files limit of the OS. The idea I have for this one is to implement a
>>different kind of segment - one that is composed of a single file. Once
>>a segment is created by IndexWriter, it never changes (besides the
>>deletes), so it could easily be stored as a single file.
>>
>
>I will check this thing with JDataStore. Maybe we could borrow couple of
>ideas from them (like built-in file system)... This would simplify life -
>one file for all indices, tx support?, backup, etc.
>
This JDataStore, I assume it is proprietary by Borland? The source isn't 
available is it? Probably many of the problems they address won't exist 
in Lucene if we only use this for finished segments, since they will be 
read-only. I think there are a lot of issues related to fragmentation 
and growth of files that a filesystem has to address if it supports writing.

Dmitry.


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>