lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Becker <>
Subject Re: Indexing very large sets (10 million docs)
Date Mon, 28 Jul 2003 21:49:03 GMT
Roger Ford wrote:
[...index size troubles...]

> Believe it or not, this 10 million documents was meant to be a single
> partition of a much larger dataset. I'm not sure I'm at liberty to
> discuss in detail the data I'm indexing - but it's a massive
> geneological database. 


maybe your data type is the problem. Did you check what kind of terms 
you get? (you can use for that) I can 
imagine that tokenizing just goes wrong, thus creating a few terms too 
many. And maybe a high hit rate for each term, too. Both would increase 
the index size -- at least if I would write an index with my limited 
knowledge about the field :-)


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message