lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: multilanguage + howto search in all languages?
Date Thu, 29 Jan 2009 00:47:55 GMT
I'm not entirely sure about the fine points, but consider the
filters that are available that fold all the diacritics into their
low-ascii equivalents. Perhaps using that filter at *both* index
and search time on the English index would do the trick.

In your example, both would be 'munchen'. Straight English
would be unaffected by the filter, but any German words with
diacritics that crept in would be folded into their low-ascii
"equivalents". This would also work at index time, just in case
you indexed English text that had some German words.

NOTE: My experience is more on the Lucene side than the SOLR
side, but I'm sure the filters are available.

Best
Erick

On Wed, Jan 28, 2009 at 5:21 PM, Julian Davchev <jmut@drun.net> wrote:

> Hi,
> I currently have two indexes with solr. One for english version and one
> with german version. They use respectively english/german2 snowball
> factory.
> Right now depending on which language is website currently I query
> corresponding index.
> There is requirement though that stuff is found regardless in which
> language is found.
> So for example if searching for muenchen (will be caught correctly by
> german snowball factory as m√ľnchen) in english index it should be found.
> Right now
> it is not as I suppose english factory doesn't really care about umlauts.
>
> Any pointers are more than welcome. I am considering synonyms  but this
> will be kinda to heavy to follow/create.
> Cheers,
> JD
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message