Return-Path: Delivered-To: apmail-lucene-general-archive@www.apache.org Received: (qmail 73923 invoked from network); 11 May 2009 15:14:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 11 May 2009 15:14:35 -0000 Received: (qmail 98552 invoked by uid 500); 11 May 2009 15:14:35 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 98516 invoked by uid 500); 11 May 2009 15:14:35 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 98500 invoked by uid 99); 11 May 2009 15:14:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 May 2009 15:14:34 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ted.dunning@gmail.com designates 209.85.217.215 as permitted sender) Received: from [209.85.217.215] (HELO mail-gx0-f215.google.com) (209.85.217.215) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 May 2009 15:14:24 +0000 Received: by gxk11 with SMTP id 11so5619983gxk.5 for ; Mon, 11 May 2009 08:14:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type; bh=gigBrZbwiYoiTgk9brTgi9nl3aq/M54nHwKiaEjIEz0=; b=eJ0Qmfr2ly1sHASaPIWQEYzy0dJS5cib6OVtc9Ewx/+izZgOXP6dXZpi9SNdk8lx0P S8JO6D+g6nFzfpnHKyQmx8zRs9vz40mUjNmKW5lxcwIHq8wI9b8mv4hVEAx1SADN3a0/ ANVmpKvHhrg4vmVSbFXyMz1HUkP8TPiVPAenw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=UCRVq24b9JVcLFPNHu/dJ6yQ+W3hZjnAXqh8olnp8nYTr4hUqtUIMhDSxqjG4RF5cQ g49dJza7Flr4REtNu/pQMJvCtdi69Lgi8tegZs9o6AXATOjrpZnox6vxgsVMOUNDNspd nBcYuv1svp4azHq6VBKgUlRn6nL1bAEP9qvRY= MIME-Version: 1.0 Received: by 10.151.136.4 with SMTP id o4mr13281679ybn.238.1242054843084; Mon, 11 May 2009 08:14:03 -0700 (PDT) In-Reply-To: <23481214.post@talk.nabble.com> References: <19724177.post@talk.nabble.com> <245957.6034.qm@web50312.mail.re2.yahoo.com> <19746698.post@talk.nabble.com> <23477688.post@talk.nabble.com> <23481214.post@talk.nabble.com> From: Ted Dunning Date: Mon, 11 May 2009 08:13:43 -0700 Message-ID: Subject: Re: what if my database data contains other language (like danish, german). To: general@lucene.apache.org Content-Type: multipart/alternative; boundary=001e680f10ecb6b05b0469a46d86 X-Virus-Checked: Checked by ClamAV on apache.org --001e680f10ecb6b05b0469a46d86 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Yes. Lucene can handle that. You have to select which stemmer to use. You may have to improve the German and Danish stemmers a little bit. You may also have some issues with the fact that if Danish is 5% of your corpus, then words that occur in 100% of your Danish documents will tend to have too high weights since they only occur in 5% of your documents. Any term that occurs in more than 20% of a sub-corpus should generally be discarded from your query. This can be difficult in multi-lingual situations. For a first pass, I would ignore this issue, however. On Mon, May 11, 2009 at 4:07 AM, uday kumar maddigatla wrote: > what if my database data contains other language (like danish, german). > > Is Lucene will handle that . > > If yes How? > -- Ted Dunning, CTO DeepDyve 111 West Evelyn Ave. Ste. 202 Sunnyvale, CA 94086 www.deepdyve.com 858-414-0013 (m) 408-773-0220 (fax) --001e680f10ecb6b05b0469a46d86--