lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wolfgang Hoschek <whosc...@lbl.gov>
Subject Re: [Performance] Streaming main memory indexing of single strings
Date Tue, 03 May 2005 17:31:32 GMT
Here's a performance patch for MemoryIndex.MemoryIndexReader that 
caches the norms for a given field, avoiding repeated recomputation of 
the norms. Recall that, depending on the query, norms() can be called 
over and over again with mostly the same parameters. Thus, replace 
public byte[] norms(String fieldName) with the following code:

		/** performance hack: cache norms to avoid repeated expensive 
calculations */
		private byte[] cachedNorms;
		private String cachedFieldName;
		private Similarity cachedSimilarity;
		
		public byte[] norms(String fieldName) {
			byte[] norms = cachedNorms;
			Similarity sim = getSimilarity();
			if (fieldName != cachedFieldName || sim != cachedSimilarity) { // 
not cached?
				Info info = getInfo(fieldName);
				int numTokens = info != null ? info.numTokens : 0;
				float n = sim.lengthNorm(fieldName, numTokens);
				byte norm = Similarity.encodeNorm(n);
				norms = new byte[] {norm};
				
				cachedNorms = norms;
				cachedFieldName = fieldName;
				cachedSimilarity = sim;
				if (DEBUG) System.err.println("MemoryIndexReader.norms: " + 
fieldName + ":" + n + ":" + norm + ":" + numTokens);
			}
			return norms;
		}


The effect can be substantial when measured with the profiler, so it's 
worth it.
Wolfgang.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message