lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Polites" <>
Subject Re: Field compression too slow
Date Thu, 10 Aug 2006 15:49:06 GMT
Thanks for the Jira issue...

one question on your synchronization comment...

I have "assumed" I can't have two threads writing to the index concurrently,
so have implemented my own read/write locking system.  Are you saying I
don't need to bother with this? My reading of the doco suggests that you
shouldn't have two IndexWriters open on the same index.

I know that if I try a search from a different JVM while the index is being
written I get the odd "FileNotFound" exception, so I had assumed writing
concurrently would be a bigger problem.

Of course there is a difference between multiple threads in a single JVM,
and threads in multiple JVM's (which is my situation).  But I may be able to
re-architect so I have a single JVM reading/writing the one index if it will
allow me to ignore my own locking/unlocking system.

As it turns out I have devised an alternate strategy.  Storing large amounts
of data in the index (compressed or not) seems to have the secondary effect
of slowing down retrieval of results... and even led to OutOfMemory errors
for me (presumably because the hits.doc(n) call loads the stored fields into

I needed to store the contents of all fields, so when I re-index the
document (as some fields change) I don't lose this data (my kingdom for the
ability to "update" a field!).  I decided to store the "large" data
elsewhere outside the index (where I can store/compress it asynchronously)
and pull it out from here when I need to re-index.

Thanks again for the response.

On 8/11/06, Michael McCandless <> wrote:
> >> I'm not sure if it would help my particular situation, but is there
> >> any way
> >> to provide the option of specifying the compression level?  The level
> >> used
> >> by Lucene (level 9) is the maximum possible compression level.  Ideally
> I
> >> would like to be able to alter the compression level on the basis of
> the
> >> field size.  This way I can smooth out the compression times across the
> >> various document sizes.  I am more interested in consistent time than
> >> I am
> >> consistent compression.
> >
> > I agree, we should make the compression level configurable.  It's
> > disturbing that it takes minutes to compress a 4.5 MB document!  I'll
> > open a Jira issue for this.
> OK I created to track
> this issue.
> Mike
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message