lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin Sevigny" <sevi...@ajlsm.com>
Subject RE : Indexing very large sets (10 million docs)
Date Mon, 28 Jul 2003 16:19:50 GMT
Roger,

Just to double-check...

> Each document is typically only around 2K in size. Each field is 
> free-text indexed, but only the "key" field is stored.

> After experimenting, I've set
>    Java memory to 750MB
>    writer.mergeFactor = 10000
>    - and run an optimize every 50,000 documents to try to 
> keep down the
> number of open files.
> 
> So far, the furthest I've managed to get is 3 million docs,
> which used
> 
>    22,913 simultaneous open files (the maximum on Windows is 2035!)
>    Up to 79 GB of disk space
>    Around 18 hours of indexing time

You get a 79GB index with 3 millions x 2KB documents? The documents
themselves make 6GB or so, which means that your indices take 13 times
the space needed to store the documents (if they were stored as files).

For a single stored field, this factor is pretty hard to believe. Am I
missing something?

Martin Sévigny


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message