lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felipe <fel...@goshme.com>
Subject Re: Is there any difference in a document between one added field with a number of terms and a field added a number of times ?
Date Tue, 12 Jan 2010 20:44:42 GMT
You could change the boost of the field artist to be bigger than the field
alias.
    field.setBoost(artistBoost);


2010/1/12 Paul Taylor <paul_t100@fastmail.fm>

> Been doing some analysis with Luke (BTW doesnt work with StandardAnalyzer
> since Version field introduced) and discovered a problem with field lenghth
> boosting for me.
>
> I have a document that represents a recording artist (i.e Madonna, The
> Beatles ectera) it contains an artist and an alias field, the alias field
> contains other names that the artist is maybe known as, and so there can be
> multiple aliases for an artist.
>
> PseudoCode:
> (
> doc.addField(ArtistIndexField.ARTIST, rs.getString("name"));
> for (String alias : aliases.get(artistId)) {
>     doc.addField(ArtistIndexField.ALIAS, alias);
> }
> )
>
> Im finding that when I search by for the artist by the alias field if the
> value matches an alias in two different documents the document with the
> least number of aliases get the best score because the boost of the alias is
> split between the aliases on the other doc, if I ANALYSED_NO_NORMS then both
> documents return the same score.
>
> The trouble is I don't want to disable norms because I want a match on a
> single field containing less terms to score better than one with more
> scores.
>
> Full example:
>
>
> http://musicbrainz.org/search/textsearch.html?query=minihamuzu&type=artist&limit=25&adv=on&handlearguments=1
> return two results , the second result only has score of 8 because it more
> aliases than the first result, even the alias it matched on was an exact
> single term match.
> http://musicbrainz.org/show/artist/aliases.html?artistid=174327
>
> but if I remove norms then the following query (which is currently working)
>
>
> http://musicbrainz.org/search/textsearch.html?query=%22the+beatles%22&type=artist&limit=25&adv=on&handlearguments=1
>
> would stop working, in that  searching for 'The beatles' would no longer
> score rate artist 'The Beatles' better than 'The Beatles revival Band'
>
> So isn't there any way to recognise that repeated calls to addField() is
> not creating a single field with many terms,but many fields with few terms.
>
> thanks Paul
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Felipe Lobo
www.jusbrasil.com.br

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message