lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From puffm...@darksleep.com (Steven J. Owens)
Subject Re: rc4 and FileNotFoundException: an update
Date Sun, 28 Apr 2002 01:09:24 GMT

On Fri, Apr 26, 2002 at 07:05:23PM +0200, petite_abeille wrote:
> I guess it's really not my day...
> [...]
> Well, it's pretty ugly. Whatever I'm doing with Lucene in the previous 
> package (com.lucene) is magnified many folds in rc4. After processing a 
> paltry 16 objects I got:
> 
> "SZFinder.findObjectsWithSpecificationInStore: 
> java.io.FileNotFoundException: _2.f14 (Too many open files)"

     Sounds like a pretty nasty situation.  

     One suggestion I have for you is that Doug is usually very
helpful with problems like this IF you can first narrow down what is
happening to the point that you can post a clear, specific, isolated
test that consistently causes the problem to happen.  This makes sense
- any effort to solve the problem will first involve isolating the
bug, and that's a task you're best suited for, since you know your
system best.

     So maybe your best approach would be to take a copy of your
system as above, and start gradually stripping out stuff, testing
between each run, until you have most of the application-specific
stuff removed, but the problem is still reoccurring consistently.
Then post your code and ask if some of the more lucene-knowledgable
can take a look.

     Re: index integrity, I agree that it would be really, really nice
to have some sort of "sanity" check.  I have yet to actually get into
the internals of the index, but I'd guess that there must be some sort
of at least superficial check, maybe some sort of format check.  

     If I was going to kludge something together, the first approach
I'd take would be to just open the index and roll through all of the
Documents in it, accessing all of the fields (or maybe just a few main
fields per Document).  I"m not sure what I'd *do* with the field
values (printing them out to the screen might take a while), other
than perhaps checking for nulls.  But I suspect that if the code gets
throught that without causing an exception or getting null values,
then at least the index's internal format is intact.  Maybe the test
code could save the number of lucene Document objects in the index in
between checks (and, of course, update this number when you add or
remove documents), and make sure it still has the right number of
documents.

     As for repairing an index, I think that's working sort of against
the grain of Lucene.  In your case, it sounds like rebuilding the
index is important, because you're using Lucene as a data store.  I
have some similar issues myself in some things I want to build (I end
up wanting both a data store and a search index; ultimately I've ended
up choosing to have a separate data store for the extra data).  But
Lucene is a search index, meant to be used more in a cache-like style,
so there's an underlying assumption that the original data is always
around to reindex.  Thus, repairing an index is less important, since
it is assumed you can always rebuild it.  

     I don't know much of the theories behind data store systems.  It
occurs to me that using Lucene as a data store, you'll always be
working against the grain, always swimming upstream.  Maybe it'd be a
better idea to figure out some way to use Lucene as the indexing
technology in a data store, the way traditional RDBMSes use indexes,
for speeding access.  

     Or possibly you should look at Xindice (http://xml.apache.org/xindice/)
which is an XML database.  You might find it easier to adapt that to your
needs.  I'm kind of curious as to how fast Xindice's XPath execution is, and
what their indexing is based on - there might be a use for Lucene there.

Steven J. Owens
puff@darksleep.com

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message