Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 52268 invoked from network); 23 Feb 2004 11:49:50 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 23 Feb 2004 11:49:50 -0000 Received: (qmail 91797 invoked by uid 500); 23 Feb 2004 11:49:45 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 91766 invoked by uid 500); 23 Feb 2004 11:49:44 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 91750 invoked from network); 23 Feb 2004 11:49:44 -0000 Received: from unknown (HELO web25010.mail.ukl.yahoo.com) (217.12.10.46) by daedalus.apache.org with SMTP; 23 Feb 2004 11:49:44 -0000 Message-ID: <20040223114943.46562.qmail@web25010.mail.ukl.yahoo.com> Received: from [217.26.77.145] by web25010.mail.ukl.yahoo.com via HTTP; Mon, 23 Feb 2004 11:49:43 GMT Date: Mon, 23 Feb 2004 11:49:43 +0000 (GMT) From: =?iso-8859-1?q?Clandes=20Tino?= Subject: Multilanguage and wildcard support To: lucene-user@jakarta.apache.org MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Hi, all. I would like to describe my dilemma about analyzing stuff. 2. Multilanguage and wildcard support In Lucene 1.3 Final I have found very useful class PerFieldAnalyzerWrapper, which helped me to specify separate analyzer for each field. But, full text content - obtained after parsing word, excel, xml or other kind of document) should be searchable using stemming capabilities and also should support wildcard queries. I implemented this solution: - indexing of full content is done in two separate fields, because wildcard queries do not pass through analyzer, as I have read in this mailing archive. Field1 (�stemmingbody�) - matching snowball analyzer is used. Field2 (�plainbody�) - Whitespace analyzer is used. So, when user searches for some term in item�s content, I parse the query and if it contains wild character, search in "plainbody" is performed; otherwise I search in "stemmingbody", expecting better search results, that way. Is there a better way to do this, e.g. not to index full content in two separate fields, but only one (I tokenize it, index it, but not store it)? Thanks for any opinion or suggestion in advance! Best regards Milan Agatonovic ___________________________________________________________ Yahoo! Messenger - Communicate instantly..."Ping" your friends today! Download Messenger Now http://uk.messenger.yahoo.com/download/index.html --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org