lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki>
Subject Re: Binary fields and data compression
Date Tue, 31 Aug 2004 06:33:33 GMT
Robert Engels wrote:
> My estimates are based on our own projects where we see that adding a
> DeflatorInputStream around an InputStream takes about 20% of the CPU time,
> so whether to actually use it or not will depend on if the IndexReader is
> performance bound by the CPU or IO.
> The problem with the "after the read" decompression, is that you still incur
> the overhead of decompression each time the file block is accessed, since
> the OS only caches the uncompressed block (unless Lucene adds caching to the
> index read operations), but the disk IO time is almost always eliminated if
> the index reader frequently accessed the same file blocks (since the OS
> caches the data block).

As I understand the original proposal, compression would be used mostly 
for reading the data of STORED fields. When it comes to inverted lists, 
which are the main data structure used for searching over indexed 
fields, they are already "compressed" in a highly-optimized way, so 
adding another level of compression to this part wouldn't make much 
sense IMHO.

> ... thus my request that any compression support be optional.

Absolutely. :-)

Best regards,
Andrzej Bialecki

Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
FreeBSD developer (

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message