lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From (Steven J. Owens)
Subject Re: Homogeneous vs Heterogeneous indexes (was: FileNotFoundException)
Date Mon, 29 Apr 2002 23:57:29 GMT

On Mon, Apr 29, 2002 at 07:54:43PM +0200, petite_abeille wrote:
> As a final note, several people suggested to increase the number of file 
> descriptors per process with something like "ulimit"...

     Just be glad you aren't doing this on Solaris with JDK 1.1.6,
where I first ran into ulimit issues - back when I encountered this
problem, the solaris default ulimit setting was 24 files, and JDK
1.1.6 reported the problem as an "OutOfMemory" error!  Looks like
things are improving :-).

> From what I learned today, I think it's a *bad* idea to have to
> change some system parameters just because your/my app is written in
> such a way that it may run out of some system resources. Your/my app
> has to fit in the system.  Hacking "ulimit" and/or other system
> parameters is just a quick patch that will -at best- delay dealing
> with the real problem that's usually one of design.

     Yes and no.  Setting ulimit to a reasonable number of open files
is not only not a patch, it's the "right" way to do it.  I understand
where you're coming from, really, and in a certain way, it makes
sense, BUT... sometimes the impulse for clean, good design takes you
too far down a blind alley.  Sometimes there is no elegant solution.
Sometimes there is no "best" way, only one of a limited set of options
with different tradeoffs.

     By definition, Lucene is an application that trades off up front
CPU (for indexing) and file resources (for storage) for request-time
speed.  The OS's job is to manage resources, and open files are one of
those resources.  That's the tradeoff here, and it's reasonable and
expected.  Most serious applications have to have some sort of OS
variable tweaking, you're just used to having it done invisibly and

     That said, since you're working on a client/desktop application,
not a server application, you need to think about ways to handle this:

     You could figure out the "right" way to set the system
configuration on install or launch.

     You could look at the alternative techniques for indexing in
Lucene, and see if any approaches there can help - for example, maybe
doing a lot of the more intense indexing work in a RAMDirectory, then
merging it into a normal file-based Directory.

     You could look more closely at what your application is doing,
and see if there's anything you're doing wrong (perhaps opening files
and not closing them, and leaving them for the garbage collector to
eventually get around to closing?) or if you have a pessimal usage
pattern that exacerbates the situation.

     You could take a closer look at the lucene indexing and file
management stuff, and see if you can come up with a scheme to run
Lucene indexing with modified code for keeping track of file

     I'll bet Doug and the other developers would rather not add
open-file managmeent as a main, permanent part of lucene, since it
would add overhead to all uses of lucene just to deal with an
anomalous situation (use on a client/desktop machine).  But they might
be interested in a way to offer it as an optional feature, where
people using lucene in a constrained environment could configure
lucene to be careful about how many files it keeps open at any given

Steven J. Owens

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message