Return-Path: Delivered-To: apmail-lucene-solr-user-archive@locus.apache.org Received: (qmail 69905 invoked from network); 6 May 2008 22:44:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 May 2008 22:44:23 -0000 Received: (qmail 49867 invoked by uid 500); 6 May 2008 22:44:18 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 49843 invoked by uid 500); 6 May 2008 22:44:18 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 49826 invoked by uid 99); 6 May 2008 22:44:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 May 2008 15:44:18 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mike.klaas@gmail.com designates 209.85.200.169 as permitted sender) Received: from [209.85.200.169] (HELO wf-out-1314.google.com) (209.85.200.169) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 May 2008 22:43:32 +0000 Received: by wf-out-1314.google.com with SMTP id 28so21725wfc.20 for ; Tue, 06 May 2008 15:43:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:from:to:in-reply-to:content-type:content-transfer-encoding:mime-version:subject:date:references:x-mailer; bh=yKHlJkx7r8bnxVIebI7LUQNgTyR80f7pq3sokjxvl8s=; b=V3GmjyapKBC4/MgJHpm7qxTdSGm0d4hsh25nng4IKX4U2ik/k0smOmB+Z/oJ3oeVpo9BS3Bks/VhbdtLeuZcqzW6FwJgEzw85YG3XO5kP+1h8FRAHLciK8Cex1lLNSMmMgOzO/lSaqigqS8XUEXLK8lf3LxgBEYi9zFH0LeKQO8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:from:to:in-reply-to:content-type:content-transfer-encoding:mime-version:subject:date:references:x-mailer; b=YOu8bWJeheEQYmvAz7INbA8IDkZW8IOOxZgMPhHICRDfITxfE4/LKMyi3/9exP4otS8PhHiT/rUfFLUS9FcX7AlxWeVcyDqrPtKu4b1298R/9O/plnFxqXfg9sc8vDk0KZYtst+KPg9GYoQD7/dy9/yC3I9AA9zUWsy+bcgC1Gw= Received: by 10.142.54.8 with SMTP id c8mr555354wfa.318.1210113826629; Tue, 06 May 2008 15:43:46 -0700 (PDT) Received: from ?192.168.1.120? ( [24.86.255.85]) by mx.google.com with ESMTPS id 30sm2852698wfa.17.2008.05.06.15.43.44 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 06 May 2008 15:43:45 -0700 (PDT) Message-Id: <3E2220BE-A236-4290-9307-97FAF575BEA0@gmail.com> From: Mike Klaas To: solr-user@lucene.apache.org In-Reply-To: <1467c2400805051328u589b1db3y7521e846dd7c883f@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Apple Message framework v919.2) Subject: Re: multi-language searching with Solr Date: Tue, 6 May 2008 15:43:41 -0700 References: <1467c2400805050727p9914f62o9b8dacb840f1ee03@mail.gmail.com> <908893006339C0409519E4065DF3B24903188643@mailserver.ualibrary.ualberta.ca> <1467c2400805051328u589b1db3y7521e846dd7c883f@mail.gmail.com> X-Mailer: Apple Mail (2.919.2) X-Virus-Checked: Checked by ClamAV on apache.org On 5-May-08, at 1:28 PM, Eli K wrote: > Wouldn't this impact both indexing and search performance and the size > of the index? > It is also probable that I will have more then one free text fields > later on and with at least 20 languages this approach does not seem > very manageable. Are there other options for making this work with > stemming? If you want stemming, then you have to execute one query per language =20= anyway, since the stemming will be different in every language. This is a fundamental requirement: you somehow need to track the =20 language of every token if you want correct multi-language stemming. =20= The easiest way to do this would be to split each language into its =20 own field. But there are other options: you could prefix every =20 indexed token with the language: en:The en:quick en:brown en:fox en:jumped ... fr:Le fr:brun fr:renard fr:vite fr:a fr:saut=E9 ... Separate fields seems easier to me, though. -Mike=