Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 88061 invoked from network); 3 Feb 2003 15:32:29 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 3 Feb 2003 15:32:29 -0000 Received: (qmail 16287 invoked by uid 97); 3 Feb 2003 15:33:58 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@nagoya.betaversion.org Received: (qmail 16280 invoked from network); 3 Feb 2003 15:33:57 -0000 Received: from daedalus.apache.org (HELO apache.org) (208.185.179.12) by nagoya.betaversion.org with SMTP; 3 Feb 2003 15:33:57 -0000 Received: (qmail 87800 invoked by uid 500); 3 Feb 2003 15:32:26 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 87789 invoked from network); 3 Feb 2003 15:32:26 -0000 Received: from mail2.hypermall.com (216.241.37.118) by daedalus.apache.org with SMTP; 3 Feb 2003 15:32:26 -0000 Received: from [216.241.38.72] (helo=www.doomdark.org) by mail2.hypermall.com with esmtp (Exim 3.36 #1) id 18fia8-0000xY-00 for lucene-user@jakarta.apache.org; Mon, 03 Feb 2003 08:32:24 -0700 Content-Type: text/plain; charset="iso-8859-1" From: Tatu Saloranta Reply-To: tatu@hypermall.net Organization: Linux-users missalie To: "Lucene Users List" Subject: Re: '-' character not interpreted correctly in field names Date: Mon, 3 Feb 2003 08:37:18 -0700 User-Agent: KMail/1.4.3 References: <3E3E1C98.1020905@freestart.hu> <043e01c2cb8f$554f08c0$0201a8c0@netframe.com> In-Reply-To: <043e01c2cb8f$554f08c0$0201a8c0@netframe.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-Id: <200302030837.18943.tatu@hypermall.net> X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N On Monday 03 February 2003 07:19, Terry Steichen wrote: > I believe that the tokenizer treats a dash as a token separator. Hence, > the only way, as I recall, to eliminate this behavior is to modify > QueryParser.jj so it doesn't do this. However, doing this can cause some > other problems, like hyphenated words at a line break and the like. It might be enough to just replace analyzer passed in to QueryParser to do this? This is the case if QueryParser only handles modifiers outside terms, and terms are passed to analyzer. I think this is the case (QueryParser does call the analyzer in couple of places, and one word may actually expand to a phrase or vice versa)? Still, it seems like using a hyphen as separator shouldn't necessarily cause big problems when indexer does the same; queries against "2 - 5" would be phrase queries for "2 5", which is still reasonably specific (and should match the content). On the other hand, simple analyzer and standard analyzer have pretty different tokenization rules, so it's important to make sure same analyzer is used for both indexing and searching (that mismatch can prevent matches easily). -+ Tatu +- --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org