lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Goller <gol...@detego-software.de>
Subject Re: Changing Document Boosts without Reindexing
Date Sat, 23 Oct 2004 13:42:31 GMT
Dan Climan schrieb:
> I wanted to test several strategies for Document Boosting. It seems like
> the only way to do this was to reindex every Document and do setBoost. This
> will take a long time. I had an idea for how to do this without reindexing
> and I was curious if there was a better strategy or if there were additional
> points I should consider in this approach
> 
> 1) Optimize the index
> 2) Get the internal lucene doc id for each document
> 3) Update the boosts
> 	IndexReader ir = IndexReader.open(indexDir);
> 	IndexSearcher searcher = new IndexSearcher(ir) ;
> 	Similarity sim = searcher.getSimilarity();
>       Collection indexedFields = ir.getFieldNames(true);
> 	Iterator it = indexedFields.iterator();
> 	while(it.hasNext()) {
> 		String f = (String) it.next());
>     		byte[] norms = ir.norms(f);
> 	      for (int i=0; i<numDocs; i++) {
> 			float oldNorm = sim.decodeNorm(norms[i]);
>             	float newNorm = oldNorm * ( newDocBoost[i] /
> oldDocBoost[i]);
> 	  		norms[i]  =  sim.encodeNorm(norms[i]);
>       	}
>      	}
> 4) Write new norms files
> 
> Does this become prohibitively complicated using a compound file system?
> 
> Comments?
> 
> Thanks,
> Dan

IndexReader has a setNorm method. It should also work for indexes with
compound files. After (re)setting the norm, a separate norms-file is
generated which will be reintegrated into the compound file after the
next optimize or merge.

Christoph

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message