Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Content-Type: text/plain;
  charset="iso-8859-1"
From: Tatu Saloranta <tatu@hypermall.net>
Reply-To: tatu@hypermall.net
Organization: Linux-users missalie
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Subject: Re: '-' character not interpreted correctly in field names
Date: Mon, 3 Feb 2003 08:37:18 -0700
User-Agent: KMail/1.4.3
References: <3E3E1C98.1020905@freestart.hu>
 <043e01c2cb8f$554f08c0$0201a8c0@netframe.com>
In-Reply-To: <043e01c2cb8f$554f08c0$0201a8c0@netframe.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Message-Id: <200302030837.18943.tatu@hypermall.net>

On Monday 03 February 2003 07:19, Terry Steichen wrote:
> I believe that the tokenizer treats a dash as a token separator.  Hence,
> the only way, as I recall, to eliminate this behavior is to modify
> QueryParser.jj so it doesn't do this.  However, doing this can cause some
> other problems, like hyphenated words at a line break and the like.

It might be enough to just replace analyzer passed in to QueryParser
to do this? This is the case if QueryParser only handles modifiers outside
terms, and terms are passed to analyzer.
I think this is the case (QueryParser does  call the analyzer in couple of 
places, and one word may actually expand to a phrase or vice versa)?

Still, it seems like using a hyphen as separator shouldn't necessarily cause 
big problems when indexer does the same; queries against "2 - 5" would be 
phrase queries for "2 5", which is still reasonably specific (and should 
match the content).

On the other hand, simple analyzer and standard analyzer have pretty different 
tokenization rules, so it's important to make sure same analyzer is used for 
both indexing and searching (that mismatch can prevent matches easily).

-+ Tatu +-


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org