lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Polites" <jason.poli...@gmail.com>
Subject Field compression too slow
Date Thu, 10 Aug 2006 12:41:36 GMT
Hello all,

I am experiencing some performance problems indexing large(ish) amounts of
text using the IndexField.Store.COMPRESS option when creating a Field in
Lucene.

I have a sample document which has about 4.5MB of text to be stored as
compressed data within the field, and the indexing of this document seems to
take an inordinate amount of time (over 10 minutes!).  When debugging I can
see that it's stuck on the deflate() calls of the Deflater used by Lucene.

I noted that Lucene by default uses the
Deflater.BEST_COMPRESSIONcompression level when encountering a
compressed field.

I'm not sure if it would help my particular situation, but is there any way
to provide the option of specifying the compression level?  The level used
by Lucene (level 9) is the maximum possible compression level.  Ideally I
would like to be able to alter the compression level on the basis of the
field size.  This way I can smooth out the compression times across the
various document sizes.  I am more interested in consistent time than I am
consistent compression.

Or... could there some other reason my document takes this long to index?
(and hold up all other threads).

Thanks.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message