Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@apache.org Received: (qmail 33954 invoked from network); 2 Feb 2003 16:52:31 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 2 Feb 2003 16:52:31 -0000 Received: (qmail 21860 invoked by uid 97); 2 Feb 2003 16:54:01 -0000 Delivered-To: qmlist-jakarta-archive-lucene-dev@nagoya.betaversion.org Received: (qmail 21853 invoked from network); 2 Feb 2003 16:54:00 -0000 Received: from daedalus.apache.org (HELO apache.org) (208.185.179.12) by nagoya.betaversion.org with SMTP; 2 Feb 2003 16:54:00 -0000 Received: (qmail 33394 invoked by uid 500); 2 Feb 2003 16:52:25 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 33341 invoked from network); 2 Feb 2003 16:52:25 -0000 Received: from smtp-out2.iol.cz (194.228.2.87) by daedalus.apache.org with SMTP; 2 Feb 2003 16:52:25 -0000 Received: from root.cz (a170-146.dialup.iol.cz [194.228.146.170]) by smtp-out2.iol.cz (Internet on Line ESMTP server) with ESMTP id C8EA4355AD for ; Sun, 2 Feb 2003 17:52:27 +0100 (CET) Message-ID: <3E3D4AF7.4050305@root.cz> Date: Sun, 02 Feb 2003 17:44:39 +0100 From: Lukas Zapletal Reply-To: zapletal@inf.upol.cz User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; cs-CZ; rv:1.2.1) Gecko/20021130 X-Accept-Language: cs, en-us, en MIME-Version: 1.0 To: Lucene Developers List Subject: Re: Escaping bug \( and ? or * References: <3E3ADC2D.4060005@root.cz> <200302011347.57426.tatu@hypermall.net> In-Reply-To: <200302011347.57426.tatu@hypermall.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Tatu Saloranta wrote: >I think the problem is that the analyzer you used for indexer strips out >parenthesis. So, text actually indexed would look something like: >"test 1 test 2" (assuming 'and' is a stop word removed). Thus there's >no token matching term "(1)" or "(2)". >Same goes for most other punctuation characters, they are routinely >stripped by analyser, as they usually are not very useful for searching. > >To make it work the way you want, you need to modify analyzer to >included parentesis, perhaps so that they are included only if >they contain just single alpha-numeric token (otherwise >"(1 and 2)" would be tokenized to "(1" and "2)" which is probably >not what you want? > Well I think this is not true. I use this analzyer either for queries. So the parenthesis and other puncatuation are also stripped when I make query. This is MAYBE a bug. PLEASE TEST THE CODE. -- Lukas Zapletal [lzap@root.cz] http://www.tanecni-olomouc.cz/lzap --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org