lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Engels" <>
Subject normalization BAD DESIGN ?
Date Tue, 06 Jan 2004 21:17:26 GMT
The design & implementation of the document/field normalization is very

It requires a byte[] with as (number of documents * number of fields)

With a document store of 100 million documents, with multiple fields, the
memory required is staggering.

IndexReader has the following method definition,

public abstract byte[] norms(String field) throws IOException;

which is the source of the problem.

Even returning null from this method does not help, as the PhraseScorer and
derived classes, maintain a reference, and do not perform a null check.

I have modified 105 of PhraseScorer to be

    score *= Similarity.decodeNorm(norms[first.doc]); // normalize

Would it not be a better design, to define a method in IndexReader

float getNorm(String fieldname,int docnum);

so a implementation could cache this information in some fashion, or always
return 1.0 if it didn't care?

Robert Engels

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message