lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Biswas, Goutam_Kumar" <Goutam-Kumar-Bis...@deshaw.com>
Subject RE: '-' character not interpreted correctly in field names
Date Tue, 13 May 2003 06:20:01 GMT
>>>>but I haven't been able to find a solution in the 
>>>>mailing list archives, or by writing my own analyzer:

I just tried modifying your Analyzer code just a little bit and it worrked
for me. 

public class MyAnalyzer extends Analyzer {
	public TokenStream tokenStream(String fieldName, Reader reader)
{
		return new CharTokenizer(reader) {
			protected boolean isTokenChar(char c) {
				//return Character.isLetter(c) || c == '-';
                        return true;
			}
		};
	}
	}

So this analyzer won't throw away any characters whatsoever. Not sure that's
what you want though!

Thanks,
-Goutam



-----Original Message-----
From: Eric Isakson [mailto:Eric.Isakson@sas.com]
Sent: Tuesday, May 13, 2003 1:50 AM
To: Lucene Users List
Subject: RE: '-' character not interpreted correctly in field names


I just looked at the QueryParser.jj code, your field names never get
processed by the analyzer. It does look like the query parser will honor
escapes though. I haven't tried this, but try a query like "foo\-bar:foo"
and have a look at the QueryParser.jj file for how it handles field names
when parsing your query.

Eric

-----Original Message-----
From: Jon Pipitone [mailto:jpipitone@mshri.on.ca] 
Sent: Monday, May 12, 2003 4:03 PM
To: Lucene Users List
Subject: Re: '-' character not interpreted correctly in field names


Hi Otis, Terry,

 >>>You can write a custom Analyzer that does not remove dashes from
>>>tokens, and use it for both indexing and searching.  >>>  >>>This
is a
frequent question and answer on this list.

Sorry for the noise, but I haven't been able to find a solution in the 
mailing list archives, or by writing my own analyzer:

	public class MyAnalyzer extends Analyzer {
	public TokenStream tokenStream(String fieldName, Reader reader)
{
		return new CharTokenizer(reader) {
			protected boolean isTokenChar(char c) {
				return Character.isLetter(c) || c == '-';
			}
		};
	}
	}

I parse a query like this:

	String queryString = "foo-bar:foo";
	String queryResult =
		QueryParser.parse(queryString, "body", new MyAnalyzer())

With the output:
	body:foo -bar:foo

But I would expect the output:
	 foo-bar:foo

If I print out the tokens that MyAnalyzer produces I do get "foo-bar" 
and then "foo".

Any pointers on what I'm doing wrong?

jp



>>>--- Jon Pipitone <jpipitone@mshri.on.ca> wrote:
>>>
>>>>Hi all,
>>>>
>>>> > I believe that the tokenizer treats a dash as a token
>>>
>>separator.
>>
>>>> > Hence, the only way, as I recall, to eliminate this behavior
>>>
>>is
>>
>>>> > to modify QueryParser.jj so it doesn't do this.  However,
>>>
>>doing
>>
>>>> > this can cause some other problems, like hyphenated words at a 
>>>> > line break and the like.
>>>>
>>>>I've recently started using lucene and I'm running into the same 
>>>>issue with the query parser.  I'd like to use queries that contain
>>>
>>dashes
>>
>>>>in
>>>>the field name, but as far as I can tell it seems that the
>>>
>>current
>>
>>>>query
>>>>grammar treats field names as terms, and so, as Terry notes, a
>>>
>>dash
>>
>>>>becomes a token seperator.
>>>>
>>>>Terry suggests modifying the QueryParser.jj -- I would suspect by 
>>>>creating a seperate non-terminal for field names.
>>>>
>>>>Has anyone done any work on this already?  Is modifying 
>>>>QueryParser.jj the best approach?
>>>>
>>>>Thanks,
>>>>jp
>>>>
>>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message