lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: Making a case for Lucene
Date Thu, 01 Jul 2004 19:24:11 GMT
 > The best example that I've been able to find is the Yahoo research
 > lab - as I understand it, this is a Nutch (i.e. Lucene)
 > implementation that's providing impressive performance over a
 > 100 million document repository.

This demo runs on a handful of boxes.  It was originally running on 
three dual-processor boxes, but I think Yahoo! subsequently moved it to 
six or eight single-processor boxes.  Queries are broadcast to all 
servers, and the top-scoring matches overall are presented.

In Nutch-based benchmarks, we found that a single-processor box with 4GB 
of memory and a 2M page Nutch index (i.e., the entire index fits in RAM) 
could handle over 20 Nutch searches/second.  A box with 1GB of memory 
and a 20M page Nutch index (i.e., the entire index does not fit in 
memory) could only handle around 1 or 2 Nutch searches/second.  These 
were done with Lucene 1.3.  Lucene 1.4 should be somewhat faster. 
Performance will obviously vary with processor speed, disk speed, 
average document size, average number terms per query, etc.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message