Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7D52E10DAA for ; Sun, 7 Jul 2013 17:48:15 +0000 (UTC) Received: (qmail 69918 invoked by uid 500); 7 Jul 2013 17:48:10 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 69839 invoked by uid 500); 7 Jul 2013 17:48:09 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 69830 invoked by uid 99); 7 Jul 2013 17:48:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 07 Jul 2013 17:48:08 +0000 X-ASF-Spam-Status: No, hits=1.3 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of SRS0=tQ+EKB=QV=basetechnology.com=jack@yourhostingaccount.com designates 65.254.253.110 as permitted sender) Received: from [65.254.253.110] (HELO mailout13.yourhostingaccount.com) (65.254.253.110) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 07 Jul 2013 17:48:03 +0000 Received: from mailscan18.yourhostingaccount.com ([10.1.15.18] helo=mailscan18.yourhostingaccount.com) by mailout13.yourhostingaccount.com with esmtp (Exim) id 1Uvt3l-0001Ns-RR for solr-user@lucene.apache.org; Sun, 07 Jul 2013 13:47:41 -0400 Received: from impout02.yourhostingaccount.com ([10.1.55.2] helo=impout02.yourhostingaccount.com) by mailscan18.yourhostingaccount.com with esmtp (Exim) id 1Uvt3l-0001F9-D5 for solr-user@lucene.apache.org; Sun, 07 Jul 2013 13:47:41 -0400 Received: from authsmtp01.yourhostingaccount.com ([10.1.18.1]) by impout02.yourhostingaccount.com with NO UCE id xVnh1l00301P85W01VnhV9; Sun, 07 Jul 2013 13:47:41 -0400 X-Authority-Analysis: v=2.0 cv=HIVB5/Rv c=1 sm=1 a=UdCbmyego4VUa/xJBgcoFg==:17 a=aQzbgH187woA:10 a=xsgMUSidcoEA:10 a=3jZET7lWBKwA:10 a=8nJEP1OIZ-IA:10 a=jvYhGVW7AAAA:8 a=Enn7I6Mijd0A:10 a=mV9VRH-2AAAA:8 a=9I5xiGouAAAA:8 a=hGEQj7hYmZerYBhII-4A:9 a=wPNLvfGTeEIA:10 a=XlTSOztaRs8A:10 a=2fPOlPt4dusA:10 a=ilymawf/5WNU8sTmGLp1gQ==:117 X-EN-OrigOutIP: 10.1.18.1 X-EN-IMPSID: xVnh1l00301P85W01VnhV9 Received: from 207-237-114-232.c3-0.nyr-ubr1.nyr.ny.cable.rcn.com ([207.237.114.232] helo=JackKrupansky) by authsmtp01.yourhostingaccount.com with esmtpa (Exim) id 1Uvt3l-0004mi-4e for solr-user@lucene.apache.org; Sun, 07 Jul 2013 13:47:41 -0400 Message-ID: <737CC217F0A849B7993C6E7B11714BF7@JackKrupansky> From: "Jack Krupansky" To: References: <1373218361758-4076057.post@n3.nabble.com> In-Reply-To: <1373218361758-4076057.post@n3.nabble.com> Subject: Re: Why shouldn't lang-id component work at query-time? Date: Sun, 7 Jul 2013 13:47:35 -0400 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 8bit X-Priority: 3 X-MSMail-Priority: Normal Importance: Normal X-Mailer: Microsoft Windows Live Mail 15.4.3555.308 X-MimeOLE: Produced By Microsoft MimeOLE V15.4.3555.308 X-EN-UserInfo: e0a4b55451ed9f27313ebf02e3d4348d:fc4a93e1349e680c52bdd723c0ab3ef6 X-EN-AuthUser: jack@basetechnology.com Sender: "Jack Krupansky" X-EN-OrigIP: 207.237.114.232 X-EN-OrigHost: 207-237-114-232.c3-0.nyr-ubr1.nyr.ny.cable.rcn.com X-Virus-Checked: Checked by ClamAV on apache.org The problem at query time is simple: a typical query has too few terms to reliably identify the language using statistical techniques, especially for a language like English which is famous for "borrowing" words from other languages. I mean, is "raison d'�tre" REALLY French anymore? Or, are "sombrero" or "poncho" or "ma�ana" really strictly Spanish anymore? Multi-lingual support is an art/craft; don't expect cookbook answers that will apply to all apps in all environments. That said, Edismax searching of multiple field, one for each language is probably the best you're going to do without doing something super-sophisticated. -- Jack Krupansky -----Original Message----- From: adfel70 Sent: Sunday, July 07, 2013 1:32 PM To: solr-user@lucene.apache.org Subject: Why shouldn't lang-id component work at query-time? Hi, I'm trying to integrate solr's lang-id component in my solr environment. In my scenario, I have documents in many different languages. I want to index them in the same solr collection, to different fields and apply language-specific analyzers on each field by its language. So far lang-id component does exactly what I need. The problem is that in all recepies that I've read, eventually at query-time I have to indicate which language I'm querying. Either by specifying the field I want to search: /solr/collection/select?q=text_it:abc abc Or by creating a language-specific request handler which I would have to use like this: /solr/collection/selectIT?q=text:abc abc Either way, I must tell solr the language, which in my case - a web client+many different languages, it's quite problematic. I was wondering why shouldn't lang-id component provide a full ability to index and query on multi-languages when both in indexing and in querying the language is transparent to the client. This could be achieved by applying the same language-detection tool at query time. Any insights? -- View this message in context: http://lucene.472066.n3.nabble.com/Why-shouldn-t-lang-id-component-work-at-query-time-tp4076057.html Sent from the Solr - User mailing list archive at Nabble.com.