Return-Path: Delivered-To: apmail-lucene-solr-user-archive@locus.apache.org Received: (qmail 26648 invoked from network); 16 Jan 2007 16:49:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 16 Jan 2007 16:49:46 -0000 Received: (qmail 14628 invoked by uid 500); 16 Jan 2007 16:49:45 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 14534 invoked by uid 500); 16 Jan 2007 16:49:44 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 14502 invoked by uid 99); 16 Jan 2007 16:49:44 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Jan 2007 08:49:44 -0800 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: local policy) Received: from [128.143.2.194] (HELO fork4.mail.virginia.edu) (128.143.2.194) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Jan 2007 08:49:33 -0800 Received: from localhost (localhost [127.0.0.1]) by fork4.mail.virginia.edu (Postfix) with ESMTP id 4019711B0B9 for ; Tue, 16 Jan 2007 11:49:12 -0500 (EST) Received: from fork4.mail.virginia.edu ([127.0.0.1]) by localhost (fork4.mail.virginia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 00396-02 for ; Tue, 16 Jan 2007 11:49:12 -0500 (EST) Received: from [128.143.193.244] (d-128-193-244.bootp.Virginia.EDU [128.143.193.244]) by fork4.mail.virginia.edu (Postfix) with ESMTP id 03F0811B07D for ; Tue, 16 Jan 2007 11:49:11 -0500 (EST) In-Reply-To: References: Mime-Version: 1.0 (Apple Message framework v752.2) X-Priority: 3 Content-Type: multipart/alternative; boundary=Apple-Mail-34--467125946 Message-Id: <1B9149C3-8AD1-4889-A711-00F2A1E232D5@virginia.edu> From: Bess Sadler Subject: Re: Internationalization Date: Tue, 16 Jan 2007 11:48:38 -0500 To: solr-user@lucene.apache.org X-Mailer: Apple Mail (2.752.2) X-UVA-Virus-Scanned: by amavisd-new at fork4.mail.virginia.edu X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-34--467125946 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed Hi, J=F6rg. At the Tibetan Himalayan Digital Library, we are working with XML =20 files that have fields that might be in Tibetan, Chinese, Nepalese, =20 or English. Our solr schema.xml file looks like this: I run all of our XML data through a XSL transformation that puts it =20 in solr indexable form and also figures out what language a field is =20 in and gives it an appropriate name, e.g., "location_eng" or =20 "formalname_tib". So far this is working very well for us. Currently, we are assigning all fields, no matter what language to =20 type string, defined as This does string matching very well, but doesn't do any stop words, =20 or stemming, or anything fancy. We are toying with the idea of a =20 custom Tibetan indexer to better break up the Tibetan into discrete =20 words, but for this particular project (because it mostly has to do =20 with proper names, not long passages of text) this hasn't been a =20 problem yet, and the above solution seems to be doing the trick. I hope this helps. Good luck! Bess On Jan 16, 2007, at 10:23 AM, J=F6rg Pfr=FCnder wrote: > Hello, > > is there anyone who has experience on internationalization =20 > (internationalisation) with SOLR? > > How do you setup a multi language data index? Should we use a =20 > dynamic field like text_en, text_fr, text_es? > > Is there a GermanPorterFilterFactory or FrenchPorterFilterFactory? > > Thank you very much. > > J=F6rg Pfr=FCnder > > _____________________________________________________ > Gratis Emailpostfach mit 2 GB Speicher - > 10 SMS - http://www.xemail.de > Spam? mailto:xemail@xemail.de > Elizabeth (Bess) Sadler Head, Technical and Metadata Services Digital Scholarship Services Box 400129 Alderman Library University of Virginia Charlottesville, VA 22904 bess@virginia.edu (434) 243-2305 --Apple-Mail-34--467125946--