lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lothar Maerkle <>
Subject Estimate index filesystem requirements
Date Sun, 18 Nov 2007 15:15:48 GMT

I'm wondering if there is a kind of "formule" to estimate the size of a
lucene index. Searching the list, I did not find any pointers.

Does anybody has a hint?

What I figured out from the file format description and some empirical
tests is, that for every index-file:
  field-data .fdt:  NumberOfDocs * NumberOfFieldsPerDoc
  field-index .fdx: NumberOfDocs * 8
  field-info .fnm:  ignored

  term-data .tis:   NumberOfTerms * 8
  term-index .tii:  no idea so far
  term-freq: .frq:  estimated as NumberOfDocs * NumberOfTerms
  Norm file: .nrm:  NumberOfDocs

This concerns only Un-stored fields of course.

I estimate the total NumberOfTerms of my document collection with 10% of
the NumberOfDocuments. Does someone has similiar experience?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message