lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <>
Subject Re: Large index question
Date Thu, 12 Oct 2006 21:42:44 GMT
"Scott Smith" <> wrote on 12/10/2006 14:14:57:

> Supposed I want to index 500,000 documents (average document size is
> 4kBs).  Let's assume I create a single index and that the index is
> static (I'm not going to add any new documents to it).  I would guess
> the index would be around 2GB.

The input data size is ~2GB but the index itself may be smaller,
particularly if not storing fields/termvectors.

> Now, I do searches against this on a somewhat beefy machine (2GB RAM,
> Core 2 Duo, Windows XP).  Does anyone have any idea what kinds of search
> times I can expect for moderately complicated searches (several sets of
> keywords against several fields)?  Are there things I can do to increase
> search performance?  For example, does Lucene like lots of RAM, lots of
> CPU, faster HD, all of the above?  Am I better splitting the index file
> into 2 (N?) versions and search on multiple indexes simultaneously?
> Anyone have any thoughts about this?

Indexing time (at list for plain text or simple HTML) would be stg near
half an hour, so you might just give it a try. If index size turns out to
be small enough to reside in RAM (and you don't need the RAM for other
activities at the same time) you could try RAMDirectory. I wonder if anyone
ever compared RAMDir to a "hot" searcher above FSDir, - seems that having
all the index data in RAM would be faster than relying on IO caching by the
system, but if for some reason the RAMDir cannot be in RAM all the time, I
would assume that paging in/out would make it more costly than using FSDir
and just count on system IO caching. In the latter case see relevant
discussions on warming a searcher and caching filters.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message