lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Condit <con...@sdsc.edu>
Subject Best practices for searcher memory usage?
Date Tue, 13 Jul 2010 21:49:57 GMT
We're getting up there in terms of corpus size for our Lucene indexing application:
* 20 million documents
* all fields need to be stored
* 10 short fields / document 
* 1 long free text field / document (analyzed with a custom shingle-based analyzer)
* 140GB total index size
* Optimized into a single segment
* Must run over NFS due to VMWare setup

I think I've already taken the most common steps to reduce memory requirements and increase
performance on the searching side including:
* omitting norms on all fields except two
* omitting term vectors
* indexing as few fields as possible
* reusing a single searcher
* splitting the index up into N shards for ParallelMultiSearcher

The application will run with 10G of -Xmx but any less and it bails out. It seems happier
if we feed it 12GB. The searches are starting to bog down a bit (5-10 seconds for some queries)...

Our next step was to deploy the shards as RemoteSearchables for the same ParallelMultiSearcher
over RMI - but before I do that I'm curious:
* are there other ways to get that memory usage down?
* are there performance optimizations that I haven't thought of?

Thanks,
-Chris


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message