lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Index Ratio
Date Thu, 25 Jun 2009 04:06:13 GMT

Hi Brett,

Try creating a simple MS Word document with just a single character in it.  Save it as .doc
and check the size.  Export to PDF and check the size.  I don't know exactly how big those
docs will be, but I bet they'll be many, many times larger than that one byte character. 
Open up your index with Luke to see what's in it.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: pof <MelbourneBeerBaron@gmail.com>
> To: general@lucene.apache.org
> Sent: Wednesday, June 24, 2009 8:47:39 PM
> Subject: Index Ratio
> 
> 
> Hi, I just completed a batch test index of ~1100 documents of various file
> types and I noticed that the original documents take up about 145MB but my
> index is only 1.7MB?? I remember reading somewhere that the typical
> compression rate is about 20-30% or something, but mine is a little over 1%!
> I'm not complaining or anything It just struck me a odd especially as I have
> a lot of archive files and emails with attachments that I parse as well. Has
> anyone else experienced something like this, I'm just curious.
> 
> Cheers. Brett.
> -- 
> View this message in context: 
> http://www.nabble.com/Index-Ratio-tp24195272p24195272.html
> Sent from the Lucene - General mailing list archive at Nabble.com.


Mime
View raw message