lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin Sevigny" <sevi...@ajlsm.com>
Subject RE : Indexing very large sets (10 million docs)
Date Mon, 28 Jul 2003 16:36:46 GMT
Roger,

> Given that on my previous 16GB partition it managed 1.5 million rows
> before failing, it looks like disk space requirements grow 
> exponentially
> with number of documents indexed.  Can anyone comment whether this
> should be true?

Exponentially? Would be surprising.

When you add documents, I think these factors will make your index
larger :

- you add new terms => this tends to disappear with the increase in
documents indexed if your data is in only one language for instance
- you add new references to documents with terms => should not grow
exponentially, no?
- you add new contents in stored fields : this shoud be proportional to
the number of documents, and since you only have one stored field...

I've seen indices being somewhat around 20% larger than the data
indexed, with a few stored fields, for a few thousand documents indexed.

I'm still not convinced that you don't have a problem somewhere else. I
did'nt inspect your code, but are you sure that you are not indexing
data more than once? Or storing lots of fields?

Martin Sévigny


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message