lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <>
Subject Re: Score exact matches higher than matches that match analysed text but not original text
Date Tue, 10 Jan 2012 10:18:45 GMT
If a term has an accent, add both accented and unaccented versions at
index and search time.

So in your example your default field would contain

República Republica

and a search for "República" would expand to "República Republica" and
match both and score higher than a search for "Republica" which would
just match the unaccented version.

It's not quite synonyms but you could borrow synonym code from
somewhere.  There's stuff in the lucene contrib area and in LIA and
maybe elsewhere.  I've used the LIA code to do something similar.

An alternative would be to store accented versions in a separate field
and add a query for that field to the mix if you have accented terms.
You could boost that part of the query.


On Tue, Jan 10, 2012 at 9:12 AM, Paul Taylor <> wrote:
> My analyser strips out accents as often these are not entered correctly, so
> assume there are two documents in the database with default field containing
> República
> Republica
> a search for República or Republica will return both results, each with a
> score of 1.
> Its correct that they both get returned but it would be really nice if at
> the scoring stage it could recognise that if I had search for República that
> the document containing República is a slightly better match than the other
> one and score slightly higher, and vice versa.
> Is there are any way to do this in Lucene, alternatively I thought about
> augmenting the score results returned by Lucene, and when multiple results
> have the same score  check the number of matching letters and increase the
> score based on how many letters match, but only increase the score so still
> lower than any results that Lucene scored higher. I also realise that this
> seems to make sense when just searching one field but more complex when the
> query is searching over multiple fields but I think in this case when
> searching for artists/bands (music) I would only do the boost if the artist
> name was one of the search fields.
> Paul

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message