lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Sokolov <msoko...@safaribooksonline.com>
Subject Re: Multi-valued field and numTerms
Date Thu, 15 Jan 2015 14:32:02 GMT
On 1/15/15 4:34 AM, rama44ster wrote:
> Hi,
> I am using lucene to index documents that have a multivalued text field
> named ‘city’.
> Each document might have multiple values for this field, like la, los
> angeles etc.
>
> Assuming
> document d1 contains city = la ; city = los angeles
> document d2 contains city = la mirada
> document d3 contains city = la quinta
>
> Now when I search for 'la', I would prefer getting d1 as it has the exact
> match ie., a match that doesn't have any extra terms than what is in the
> query. I read lucene already prefers documents with fewer terms as
> DefaultSimilarity.computeNorm does
>
> return state.getBoost() * ((float) (1.0 / Math.sqrt(numTerms)));
>
> The problem I have is, I am not sure how numTerms is calculated for a
> multivalued field like city. Here would numTerms for d1 be 1 or 3? Would
> the numTerms be the sum of all the numTerms for each field value?
>
> Any idea on how to make the document d1 rank higher than d2 and d3?
>
> Thanks in advance,
> Prasad.
>
One thing we have done to prefer "exact" matches is to index magic 
anchoring terms at the start/finish of every field and then use phrase 
queries to boost exact matches.  EG you would index

document 1 city = __anchor__ la __anchor__ ; city = __anchor__ los 
angeles __anchor__

then you can query for:

la "__anchor__ la __anchor__"^2

this won't do the same thing you asked for, but it might be what you want?

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message