lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Lu" <chris...@gmail.com>
Subject Re: Large index question
Date Fri, 13 Oct 2006 01:59:48 GMT
Lots of memory will help a lot. I have a customer of DBSight and he is
using Intel Core Duo, and configure everything in memory. The index
size is about 700M. When I checked his system's average response time,
it's 12ms! I guess you can estimate what you will get from your beefy
machine.

So it maybe a good idea to try your index in a 64bit JVM with the
whole index in memory.

For indexing, it's better to have faster disks for this IO intensive process.

Chris Lu
-------------------------
Instant Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com


On 10/12/06, Doron Cohen <DORONC@il.ibm.com> wrote:
> "Scott Smith" <ssmith@mainstreamdata.com> wrote on 12/10/2006 14:14:57:
>
> > Supposed I want to index 500,000 documents (average document size is
> > 4kBs).  Let's assume I create a single index and that the index is
> > static (I'm not going to add any new documents to it).  I would guess
> > the index would be around 2GB.
>
> The input data size is ~2GB but the index itself may be smaller,
> particularly if not storing fields/termvectors.
>
> > Now, I do searches against this on a somewhat beefy machine (2GB RAM,
> > Core 2 Duo, Windows XP).  Does anyone have any idea what kinds of search
> > times I can expect for moderately complicated searches (several sets of
> > keywords against several fields)?  Are there things I can do to increase
> > search performance?  For example, does Lucene like lots of RAM, lots of
> > CPU, faster HD, all of the above?  Am I better splitting the index file
> > into 2 (N?) versions and search on multiple indexes simultaneously?
> >
> > Anyone have any thoughts about this?
>
> Indexing time (at list for plain text or simple HTML) would be stg near
> half an hour, so you might just give it a try. If index size turns out to
> be small enough to reside in RAM (and you don't need the RAM for other
> activities at the same time) you could try RAMDirectory. I wonder if anyone
> ever compared RAMDir to a "hot" searcher above FSDir, - seems that having
> all the index data in RAM would be faster than relying on IO caching by the
> system, but if for some reason the RAMDir cannot be in RAM all the time, I
> would assume that paging in/out would make it more costly than using FSDir
> and just count on system IO caching. In the latter case see relevant
> discussions on warming a searcher and caching filters.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message