lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Solr gives the same fieldnorm for two different-size fields
Date Thu, 31 Jul 2014 18:54:57 GMT
You can consider, say, a copyField directive and copy the field into a
string type (or perhaps keyworTokenizer followed by lowerCaseFilter) and
then match or boost on an exact match rather than trying to make scoring
fill this role.

In any case, I'm thinking of normalizing the sensitive fields and indexing
them as a single token (i.e. the string type or keywordtokenizer) to
disambiguate these cases.

Because otherwise I fear you'll get one situation to work, then fail on the
next case. In your example, you're trying to use length normalization to
influence scoring to get the doc with the shorter field to sort above the
doc with the longer field. But what are you going to do when your target is
"university of california berkley research"? Rely on matching all the
terms? And so on...

Best,
Erick


On Thu, Jul 31, 2014 at 10:26 AM, gorjida <ali@sciencescape.net> wrote:

> Thanks so much for your reply... In my case, it really matters because I am
> going to find the correct institution match for an affiliation string...
> For
> example, if an author belongs to the "university of Toronto", his/her
> affiliation should be normalized against the solr... In this case,
> "University of California Berkley Research" is a different place to
> "university of california berkeley"... I see top-matches are tied in the
> score for this specific example... I can break the tie using other
> techniques... However, I am keen to see if this is a common problem in
> solr?
>
> Regards,
>
> Ali
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-gives-the-same-fieldnorm-for-two-different-size-fields-tp4150418p4150430.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message