lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <>
Subject Re: Relevance boosting with the aid of semantic markup
Date Sat, 08 Dec 2001 23:54:39 GMT
Doug Cutting wrote:
> > From: Stefano Mazzocchi []
> >
> > So, architecturally, it could be added to Lucene by making the
> > vector-space generator pluggable (or at least, extensible).
> >
> > What do you think?
> It's not clear to me what this would entail.  For performance, Lucene's
> vector model is, in a sense, "compiled into" Lucene's architecture.  For
> example, Lucene currently has no explicit representation of a vector.  Some
> changes to term weighting or document normalization are easy to make, or
> make pluggable, but radical changes to the model might require radical
> architectural changes to Lucene.
> Can you propose a more general architecture that would still be capable of
> high-performance?

>>From the API point of view, a simple addition to Field

 Field.RatedText(String name, String value, float rate);

would be enough to implement my previously proposed indexing scheme for
general XML documents.

The above should act as Field.Text(name, value), but the 'relevance' of
this text is 'boosted' by the rating multiplying factor.

So, if I have two documents




searching for "Lucene" should return me a higher rank for the second one
than for the first.

But I have honestly no idea about the changes required to Lucene to
support this.

Do you think it would be feasible to provide such a feature without
impacting the rest of the system?

Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<>                             Friedrich Nietzsche

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message