lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Engels" <reng...@ix.netcom.com>
Subject normalization BAD DESIGN ?
Date Tue, 06 Jan 2004 21:17:26 GMT
The design & implementation of the document/field normalization is very
poor.

It requires a byte[] with as (number of documents * number of fields)
elements!

With a document store of 100 million documents, with multiple fields, the
memory required is staggering.

IndexReader has the following method definition,

public abstract byte[] norms(String field) throws IOException;

which is the source of the problem.

Even returning null from this method does not help, as the PhraseScorer and
derived classes, maintain a reference, and do not perform a null check.

I have modified 105 of PhraseScorer to be

if(norms!=null)
    score *= Similarity.decodeNorm(norms[first.doc]); // normalize

Would it not be a better design, to define a method in IndexReader

float getNorm(String fieldname,int docnum);

so a implementation could cache this information in some fashion, or always
return 1.0 if it didn't care?

Robert Engels




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message