Been doing some analysis with Luke (BTW doesnt work with
StandardAnalyzer since Version field introduced) and discovered a
problem with field lenghth boosting for me.
I have a document that represents a recording artist (i.e Madonna, The
Beatles ectera) it contains an artist and an alias field, the alias
field contains other names that the artist is maybe known as, and so
there can be multiple aliases for an artist.
PseudoCode:
(
doc.addField(ArtistIndexField.ARTIST, rs.getString("name"));
for (String alias : aliases.get(artistId)) {
doc.addField(ArtistIndexField.ALIAS, alias);
}
)
Im finding that when I search by for the artist by the alias field if
the value matches an alias in two different documents the document with
the least number of aliases get the best score because the boost of the
alias is split between the aliases on the other doc, if I
ANALYSED_NO_NORMS then both documents return the same score.
The trouble is I don't want to disable norms because I want a match on a
single field containing less terms to score better than one with more
scores.
Full example:
http://musicbrainz.org/search/textsearch.html?query=minihamuzu&type=artist&limit=25&adv=on&handlearguments=1
return two results , the second result only has score of 8 because it
more aliases than the first result, even the alias it matched on was an
exact single term match.
http://musicbrainz.org/show/artist/aliases.html?artistid=174327
but if I remove norms then the following query (which is currently working)
http://musicbrainz.org/search/textsearch.html?query=%22the+beatles%22&type=artist&limit=25&adv=on&handlearguments=1
would stop working, in that searching for 'The beatles' would no longer
score rate artist 'The Beatles' better than 'The Beatles revival Band'
So isn't there any way to recognise that repeated calls to addField() is
not creating a single field with many terms,but many fields with few terms.
thanks Paul
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
|