lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lothar Maerkle <lot...@mountproc.org>
Subject Estimate index filesystem requirements
Date Sun, 18 Nov 2007 15:15:48 GMT
Hi,

I'm wondering if there is a kind of "formule" to estimate the size of a
lucene index. Searching the list, I did not find any pointers.

Does anybody has a hint?

What I figured out from the file format description and some empirical
tests is, that for every index-file:
Field-files:
  field-data .fdt:  NumberOfDocs * NumberOfFieldsPerDoc
  field-index .fdx: NumberOfDocs * 8
  field-info .fnm:  ignored

Term-Files:
  term-data .tis:   NumberOfTerms * 8
  term-index .tii:  no idea so far
  term-freq: .frq:  estimated as NumberOfDocs * NumberOfTerms
Normalization:
  Norm file: .nrm:  NumberOfDocs

This concerns only Un-stored fields of course.

I estimate the total NumberOfTerms of my document collection with 10% of
the NumberOfDocuments. Does someone has similiar experience?

lofi


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message