lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: Multilingual Solr
Date Mon, 06 Jun 2016 11:26:18 GMT
There is a language auto-detect UpdateRequestProcessor to route
indexed content to differently suffixed fields. You have Google's
algorithm: http://www.solr-start.com/info/update-request-processors/#LangDetectLanguageIdentifierUpdateProcessorFactory
or a Tika one: http://www.solr-start.com/info/update-request-processors/#TikaLanguageIdentifierUpdateProcessorFactory

To map during retrieval, you could use aliases, like I did in my book
example some years ago:
https://github.com/arafalov/solr-indexing-book/blob/master/published/languages/conf/solrconfig.xml#L20

Does this cover your needs?

Regards,
   Alex.
----
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 6 June 2016 at 06:57, Riedl, Johannes
<johannes.riedl@uni-tuebingen.de> wrote:
> Hi all,
>
> we are currently in search of a solution for switching between different languages in
the query results and keeping the possibility to perform a search in several languages in
parallel.  The overall aim would be a constant field name and a an additional Solr parameter
"lang=XX_YY" that allows to return the results in the chosen language while searches are applied
to all languages. Setting up several cores to obtain a generic field name is not an option.
Does anyone know of a clean way to achieve this, particularly routing content indexed to a
generic field (e.g. title) to a "background field" (e.g. title_en, title_fr) etc on the fly
and retrieving it from there depending on the language chosen.
>
> Background: So far, we have investigated the multi-language field approach offered by
Trey Grainger in the code examples for "Solr in Action" (https://github.com/treygrainger/solr-in-action.git,
chapter 14), an extension to the ordinary textField that allows to use a generic field name
and the language is encoded at the beginning of the field content and appropriate index and
query analyzers associated to dummy fields in schema.xml. If there is a way to store data
in these dummy fields and additionally the lang parameter is added we might be done.
>
> Thanks a lot, best regards
>
> Johannes

Mime
View raw message