lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Field compression too slow
Date Thu, 10 Aug 2006 17:07:43 GMT
> I have "assumed" I can't have two threads writing to the index 
> concurrently,
> so have implemented my own read/write locking system.  Are you saying I
> don't need to bother with this? My reading of the doco suggests that you
> shouldn't have two IndexWriters open on the same index.
> I know that if I try a search from a different JVM while the index is being
> written I get the odd "FileNotFound" exception, so I had assumed writing
> concurrently would be a bigger problem.
> Of course there is a difference between multiple threads in a single JVM,
> and threads in multiple JVM's (which is my situation).  But I may be 
> able to
> re-architect so I have a single JVM reading/writing the one index if it 
> will
> allow me to ignore my own locking/unlocking system.

You are right, only one "writer" (= IndexWriter adding docs or
IndexReader deleting docs) may be open at a time, but, you can have
multiple threads (within one JVM) sharing that writer and they should
nicely parallelize (within that one JVM).  It sounds like your
situation can't take advantage of multiple threads on one writer...

Then, multiple IndexSearchers in different JVMs, can be instantiated.
Multiple threads should share a single IndexSearcher within one JVM and
should nicely parallelize.

However, IndexSearchers & IndexWriters must synchronize to make sure
you don't get that FileNotFound exception.  Basically, every time a
new IndexReader (used by IndexSearcher) is instantiated it needs to
ensure no IndexWriter is in the process of committing (writing a new
segments file).

This is currently implemented with file based locks, but this method
of locking has known bugs on remotely mounted filesystems (it sounds
likely this may be your use case?).

If you need to share an index on a remote filesystem, you either need
to do your own locking (sounds like you've done this), or take an
approach like the Solr project where you take "known safe" snapshots
of the index and each searcher cuts over to latest snapshot when it's

> As it turns out I have devised an alternate strategy.  Storing large 
> amounts
> of data in the index (compressed or not) seems to have the secondary effect
> of slowing down retrieval of results... and even led to OutOfMemory errors
> for me (presumably because the hits.doc(n) call loads the stored fields 
> into
> memory?).
> I needed to store the contents of all fields, so when I re-index the
> document (as some fields change) I don't lose this data (my kingdom for the
> ability to "update" a field!).  I decided to store the "large" data
> elsewhere outside the index (where I can store/compress it asynchronously)
> and pull it out from here when I need to re-index.

Yes, when you load a doc through Hits.doc(n) it will load all stored
fields.  There have been some good recent fixes in this area (but
after 2.0 release I believe), including ability to mark fields for
"lazy loading", and ability to load a document bug specifying a subset
of the fields that you actually want.  See for juicy details.

> Thanks again for the response.

You're welcome!


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message