lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <>
Subject Re: Why shouldn't lang-id component work at query-time?
Date Sun, 07 Jul 2013 17:47:35 GMT
The problem at query time is simple: a typical query has too few terms to 
reliably identify the language using statistical techniques, especially for 
a language like English which is famous for "borrowing" words from other 
languages. I mean, is "raison d'être" REALLY French anymore? Or, are 
"sombrero" or "poncho" or "mañana" really strictly Spanish anymore?

Multi-lingual support is an art/craft; don't expect cookbook answers that 
will apply to all apps in all environments.

That said, Edismax searching of multiple field, one for each language is 
probably the best you're going to do without doing something 

-- Jack Krupansky

-----Original Message----- 
From: adfel70
Sent: Sunday, July 07, 2013 1:32 PM
Subject: Why shouldn't lang-id component work at query-time?

I'm trying to integrate solr's lang-id component in my solr environment.
In my scenario, I have documents in many different languages. I want to
index them in the same solr collection, to different fields and apply
language-specific analyzers on each field by its language.

So far lang-id component does exactly what I need.

The problem is that in all recepies that I've read, eventually at query-time
I have to indicate which language I'm querying.
Either by specifying the field I want to search:
/solr/collection/select?q=text_it:abc abc
Or by creating a language-specific request handler which I would have to use
like this:
/solr/collection/selectIT?q=text:abc abc

Either way, I must tell solr the language, which in my case - a web
client+many different languages, it's quite problematic.

I was wondering why shouldn't lang-id component provide a full ability to
index and query on multi-languages when both in indexing and in querying the
language is transparent to the client.
This could be achieved by applying the same language-detection tool at query

Any insights?

View this message in context:
Sent from the Solr - User mailing list archive at 

View raw message