lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roger Ford <roger.f...@oracle.com>
Subject Re: Indexing very large sets (10 million docs)
Date Mon, 28 Jul 2003 16:28:51 GMT
Lichtner, Guglielmo wrote:
> That's 46 hits/s. That's not bad, actually.

It's not the time I'm worried about, so much as the disk consumption.
It's just failed optimizing 3 million documents with "No space left on
device". That's 100GB it's used!

Given that on my previous 16GB partition it managed 1.5 million rows
before failing, it looks like disk space requirements grow exponentially
with number of documents indexed.  Can anyone comment whether this
should be true?

> If there is a natural way to partition the documents (there might be, since
> they are so many), then you can manage the partition yourself and just
> search on 
> multiple indexes. It probably also makes the system more robust. In case
> someone

Believe it or not, this 10 million documents was meant to be a single
partition of a much larger dataset. I'm not sure I'm at liberty to
discuss in detail the data I'm indexing - but it's a massive
geneological database.

I think I'm going to have to limit the set to 2.5 million, and run
my tests on that instead.

- Roger


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message