lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roman Rokytskyy" <>
Subject RE: InputStream handling problem
Date Fri, 26 Apr 2002 10:22:42 GMT
> I'm sorry, I should have been more specific. The file handle is only in
> the picture when FSInputStream is cloned. From what I can tell after a
> quick look, InputStream is responsible for buffering and it delegates to
> subclasses (via a call to readInternal) to refill the buffer from the
> underlying data store. When cloned, the InputStream clones the buffer
> (in the hope that the next read will still hit the buffered data I
> suppose), but after that it has its own seek position and its own
> buffer. In the case of FSInputStream, there is a Descriptor object that
> is shared between the clones. In the case of RAMInputStream - RAMFile is
> the shared object.

What is the reason to have buffer with RAMInputStream? To have another copy
of same data?

> Perhaps a factory patter would be more flexible, but it looks like the
> existing code does a pretty good job for the RAM and FS cases. Would the
> factory pattern allow a better database implementation?

It might. If you use embedded database like JDataStore, you should not cache
data internally, database does this. So, buffer and cache simply introduce
addtional memory consumption.

> I don't know, I have not heard many complaints about that code recently.

Ok, I will try it "as is" with JDataStore, and if it works - fine.

> There is activity in terms of creating a crawler / content handler
> framework. There is also a need to handle "update" better, I think. For
> example, I think it would be great to have deletes go through
> IndexWriter and get "cached" in the new segment, to be later applied to
> the prior segments during optimization. This would make deletes and adds
> transactional.

Ok, I will have a look, but I have almost no experience with Lucene.

> Another thing on my wish / todo list is to reduce the number of OS files
> that must be open. Once you get a lot of indexes, with a number of
> stored fields, and keep re-indexing them, the number of open files grows
> rather quickly. And if Lucene is part of another program that already
> has other file IO needs, you end up quickly pushing into the max open
> files limit of the OS. The idea I have for this one is to implement a
> different kind of segment - one that is composed of a single file. Once
> a segment is created by IndexWriter, it never changes (besides the
> deletes), so it could easily be stored as a single file.

I will check this thing with JDataStore. Maybe we could borrow couple of
ideas from them (like built-in file system)... This would simplify life -
one file for all indices, tx support?, backup, etc.

Roman Rokytskyy

Do You Yahoo!?
Get your free address at

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message