lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jon Pipitone <>
Subject Re: '-' character not interpreted correctly in field names
Date Mon, 12 May 2003 20:03:08 GMT
Hi Otis, Terry,

 >>>You can write a custom Analyzer that does not remove dashes from
 >>>tokens, and use it for both indexing and searching.
 >>>This is a frequent question and answer on this list.

Sorry for the noise, but I haven't been able to find a solution in the 
mailing list archives, or by writing my own analyzer:

	public class MyAnalyzer extends Analyzer {
	public TokenStream tokenStream(String fieldName, Reader reader) 		{
		return new CharTokenizer(reader) {
			protected boolean isTokenChar(char c) {
				return Character.isLetter(c) || c == '-';

I parse a query like this:

	String queryString = "foo-bar:foo";
	String queryResult =
		QueryParser.parse(queryString, "body", new MyAnalyzer())

With the output:
	body:foo -bar:foo

But I would expect the output:

If I print out the tokens that MyAnalyzer produces I do get "foo-bar" 
and then "foo".

Any pointers on what I'm doing wrong?


>>>--- Jon Pipitone <> wrote:
>>>>Hi all,
>>>> > I believe that the tokenizer treats a dash as a token
>>>> > Hence, the only way, as I recall, to eliminate this behavior
>>>> > to modify QueryParser.jj so it doesn't do this.  However,
>>>> > this can cause some other problems, like hyphenated words at a
>>>> > line break and the like.
>>>>I've recently started using lucene and I'm running into the same
>>>>with the query parser.  I'd like to use queries that contain
>>>>the field name, but as far as I can tell it seems that the
>>>>grammar treats field names as terms, and so, as Terry notes, a
>>>>becomes a token seperator.
>>>>Terry suggests modifying the QueryParser.jj -- I would suspect by
>>>>creating a seperate non-terminal for field names.
>>>>Has anyone done any work on this already?  Is modifying
>>>>the best approach?

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message