Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 33554 invoked from network); 18 Sep 2007 21:45:39 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 18 Sep 2007 21:45:39 -0000 Received: (qmail 68824 invoked by uid 500); 18 Sep 2007 21:45:26 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 68787 invoked by uid 500); 18 Sep 2007 21:45:26 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 68776 invoked by uid 99); 18 Sep 2007 21:45:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Sep 2007 14:45:26 -0700 X-ASF-Spam-Status: No, hits=0.2 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: 80.76.149.213 is neither permitted nor denied by domain of karl.wettin@gmail.com) Received: from [80.76.149.213] (HELO ch-smtp02.sth.basefarm.net) (80.76.149.213) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Sep 2007 21:47:20 +0000 Received: from c83-249-118-113.bredband.comhem.se ([83.249.118.113]:63402 helo=[192.168.2.101]) by ch-smtp02.sth.basefarm.net with esmtp (Exim 4.66) (envelope-from ) id 1IXksQ-0003U6-9M for java-user@lucene.apache.org; Tue, 18 Sep 2007 23:45:03 +0200 Mime-Version: 1.0 (Apple Message framework v752.3) In-Reply-To: <46F041C7.6070208@gmail.com> References: <46F041C7.6070208@gmail.com> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <6CC7EE97-5DF0-4787-B58E-C73FEFEADB5C@gmail.com> Content-Transfer-Encoding: 7bit From: Karl Wettin Subject: Re: lucene for Arabic and Urdu Date: Tue, 18 Sep 2007 23:38:21 +0200 To: java-user@lucene.apache.org X-Mailer: Apple Mail (2.752.3) X-Originating-IP: 83.249.118.113 X-Scan-Result: No virus found in message 1IXksQ-0003U6-9M. X-Scan-Signature: ch-smtp02.sth.basefarm.net 1IXksQ-0003U6-9M adcae79add89f0770d1bbd43c8f5d835 X-Virus-Checked: Checked by ClamAV on apache.org 18 sep 2007 kl. 23.23 skrev Liaqat Ali: > I m new to the field of Information Retrieval and now working to > develop search engine for language like Arabic and Urdu. Kindly > guide me in this regard that how can Lucene be utilized for this > purpose. Lucene makes no distinction between languages. All data is discrete chunks of characters, also known as tokens. Tokens are repsresented in fields, and the combination of a token in a specific field is known as a term. What tokens your index end up containing depends on the analyzer strategy you will be using. An analyzer could be language sensitive, it could also be something completely different. > Can anybody tell me exactly what I should do to design a search > engine from the scratch using Lucene. You need to define what your search engine is supposed to do in order to get an answer that makes sense. Lucene in action is a pretty good book, even though it covers 1.4 or so. The SVN contains a demo application. There is also the Wiki and this forum. -- karl --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org