Hi all, I'm implementing an approach of mixture of language models in Lucene 4.0.0. Here is a little math to be precise: The ranking score for query q with t terms: p(q | \theta) = \prod_{t \in q} p(t | \theta) where p(t | \theta) = \sum_f \alpha_f p(t | \theta^f) and p(t | \theta^f) = (freq(t) + \mu_f p(t | \theta_c^f)) / (length(f) + \mu_f) \mu_f - Dirichlet prior for field f. I've enhanced LMDirichletSimilarity to work with per-field priors: public class LMPerFieldDirichletSimilarity extends LMDirichletSimilarity { @Override protected float score(BasicStats stats, float freq, float docLen) { float mu = stats.getAvgFieldLength(); float collectionProbability = ((LMStats) stats).getCollectionProbability(); float score = (freq + mu * collectionProbability) / (docLen + mu); return score; } @Override public void computeNorm(FieldInvertState state, Norm norm) { byte length = new Integer(state.getLength()).byteValue(); norm.setByte(length); } @Override protected float decodeNormValue(byte norm) { return new Byte(norm).floatValue(); } } and I can mix CustomScoreQuery, BooleanQuery and FieldsQuery to get relevant documents and compute the ranking function (the first probability). However, my current solution omits p(t | \theta^f) values for the fields, which do not contain occurrences of a given term t. Those values should be computed by LMPerFieldDirichletSimilarity.score with freq=0. Surely, the problem comes from the fact that Lucene does not retrieve such term positions by default. This problem is not so severe in case of LMDirichletSimilarity and one-field approach, since such documents are simply irrelevant. But in case of multi-field documents, one cannot omit those values, if the document contains at least one term occurrence no matter in which field. How would you add these values while scoring?