Hi Otis, Terry,
>>>You can write a custom Analyzer that does not remove dashes from
>>>tokens, and use it for both indexing and searching.
>>>
>>>This is a frequent question and answer on this list.
Sorry for the noise, but I haven't been able to find a solution in the
mailing list archives, or by writing my own analyzer:
public class MyAnalyzer extends Analyzer {
public TokenStream tokenStream(String fieldName, Reader reader) {
return new CharTokenizer(reader) {
protected boolean isTokenChar(char c) {
return Character.isLetter(c) || c == '-';
}
};
}
}
I parse a query like this:
String queryString = "foo-bar:foo";
String queryResult =
QueryParser.parse(queryString, "body", new MyAnalyzer())
With the output:
body:foo -bar:foo
But I would expect the output:
foo-bar:foo
If I print out the tokens that MyAnalyzer produces I do get "foo-bar"
and then "foo".
Any pointers on what I'm doing wrong?
jp
>>>--- Jon Pipitone <jpipitone@mshri.on.ca> wrote:
>>>
>>>>Hi all,
>>>>
>>>> > I believe that the tokenizer treats a dash as a token
>>>
>>separator.
>>
>>>> > Hence, the only way, as I recall, to eliminate this behavior
>>>
>>is
>>
>>>> > to modify QueryParser.jj so it doesn't do this. However,
>>>
>>doing
>>
>>>> > this can cause some other problems, like hyphenated words at a
>>>> > line break and the like.
>>>>
>>>>I've recently started using lucene and I'm running into the same
>>>>issue
>>>>with the query parser. I'd like to use queries that contain
>>>
>>dashes
>>
>>>>in
>>>>the field name, but as far as I can tell it seems that the
>>>
>>current
>>
>>>>query
>>>>grammar treats field names as terms, and so, as Terry notes, a
>>>
>>dash
>>
>>>>becomes a token seperator.
>>>>
>>>>Terry suggests modifying the QueryParser.jj -- I would suspect by
>>>>creating a seperate non-terminal for field names.
>>>>
>>>>Has anyone done any work on this already? Is modifying
>>>>QueryParser.jj
>>>>the best approach?
>>>>
>>>>Thanks,
>>>>jp
>>>>
>>>>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
|