lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: how to reasonably estimate the disk size for Lucene 4.x
Date Tue, 10 Mar 2015 15:38:42 GMT
In a word... no. There are simply too many variables here to give any
decent estimate.

The spreadsheet is, at best, an estimate. It hasn't been put through
any rigorous QA so the fact that it's off in your situation is not
surprising. I wish we had a better answer.

And the disk size isn't particularly interesting anyway. The *.fdt and
*.fdx files contain compressed copies of the raw data in _stored_
fields. If I index the same data with all fields set stored="true"
then stored="false", my disk size may vary by a large factor. And the
stored data has very little memory cost, memory usually being the
limiting factor in your Solr installation.

Are you storing position information? Term vectors? Are you ngramming
your fields? and on and on. Each and every one of these changes the
memory requirements...

Sorry we can't be more help

On Mon, Mar 9, 2015 at 12:20 PM, Gaurav gupta
<> wrote:
> Could you please guide me how to reasonably estimate the disk size for
> Lucene 4.x (precisely 4.8.1 version) including worst case scenario.
> I have referred the formula and excel sheet shared @
> I think it seems to be devised for Lucene 2.9. I am not sure if it's hold
> true for 4.x version.
> In my case, either the actual index size is coming close to the worst case
> or higher than that. Even, one of our enterprise customer has observed 3
> times higher index size than the estimated index size (based on excel
> sheet).
> Alternatively, can I know the average doc size in Lucene index (of a
> reasonable size of data) so that I can extrapolate that for complete 250
> million documents.
> Thanks
> Gaurav

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message