lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: Index Ratio
Date Thu, 25 Jun 2009 04:06:13 GMT

Hi Brett,

Try creating a simple MS Word document with just a single character in it.  Save it as .doc
and check the size.  Export to PDF and check the size.  I don't know exactly how big those
docs will be, but I bet they'll be many, many times larger than that one byte character. 
Open up your index with Luke to see what's in it.

Sematext -- -- Lucene - Solr - Nutch

----- Original Message ----
> From: pof <>
> To:
> Sent: Wednesday, June 24, 2009 8:47:39 PM
> Subject: Index Ratio
> Hi, I just completed a batch test index of ~1100 documents of various file
> types and I noticed that the original documents take up about 145MB but my
> index is only 1.7MB?? I remember reading somewhere that the typical
> compression rate is about 20-30% or something, but mine is a little over 1%!
> I'm not complaining or anything It just struck me a odd especially as I have
> a lot of archive files and emails with attachments that I parse as well. Has
> anyone else experienced something like this, I'm just curious.
> Cheers. Brett.
> -- 
> View this message in context: 
> Sent from the Lucene - General mailing list archive at

View raw message