Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 47151 invoked from network); 19 Jan 2011 07:00:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 19 Jan 2011 07:00:43 -0000 Received: (qmail 96801 invoked by uid 500); 19 Jan 2011 07:00:41 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 96352 invoked by uid 500); 19 Jan 2011 07:00:37 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 96330 invoked by uid 99); 19 Jan 2011 07:00:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Jan 2011 07:00:35 +0000 X-ASF-Spam-Status: No, hits=-1.6 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [64.18.3.34] (HELO exprod8og117.obsmtp.com) (64.18.3.34) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 19 Jan 2011 07:00:28 +0000 Received: from source ([216.109.100.75]) by exprod8ob117.postini.com ([64.18.7.12]) with SMTP ID DSNKTTaL9pMttkVcEfzOJeJ/2ax4X/E4wQJs@postini.com; Tue, 18 Jan 2011 23:00:08 PST Received: from us-pghmail2.ariba.com ([10.32.16.203]) by outboundpghmail.ariba.com with Microsoft SMTPSVC(6.0.3790.1830); Wed, 19 Jan 2011 02:00:14 -0500 Received: from in-blrmail2.ariba.com ([10.57.16.56]) by us-pghmail2.ariba.com with Microsoft SMTPSVC(6.0.3790.3959); Wed, 19 Jan 2011 01:53:59 -0500 Received: from [10.57.36.104] ([10.57.36.104]) by in-blrmail2.ariba.com with Microsoft SMTPSVC(6.0.3790.4675); Wed, 19 Jan 2011 12:23:54 +0530 Message-ID: <4D368A6F.7090304@ariba.com> Date: Wed, 19 Jan 2011 12:23:35 +0530 From: Vinaya Kumar Thimmappa Reply-To: vthimmappa@ariba.com Organization: Ariba User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101208 Lightning/1.0b2 Thunderbird/3.1.7 MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Best practices for multiple languages? References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 19 Jan 2011 06:53:54.0323 (UTC) FILETIME=[A30D3630:01CBB7A5] I think we should be using lucene with snowball jar's which means one index for all languages (ofcourse size of index is always a matter of concerns). Hope this helps. -vinaya On Tuesday 18 January 2011 11:23 PM, Clemens Wyss wrote: > What is the "best practice" to support multiple languages, i.e. Lucene-Documents that have multiple language content/fields? > Should > a) each language be indexed in a seperate index/directory or should > b) the Documents (in a single directory) hold the diverse localized fields? > > We most often will be searching "language dependent" which (at least performance wise) mandates one-directory-per-language... > > Any (lucene specific) white papers on this topic? > > Thx in advance > Clemens > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org