lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Wellnhofer <>
Subject Re: [lucy-user] Cannot get the exact phrase match
Date Thu, 27 Dec 2012 12:40:39 GMT
On Dec 26, 2012, at 11:00 , Aleksandar Radovanovic <> wrote:

> However, if I, for example, search for chemistry related phrase: OF(+)
> search returns no result. On the other hand, the quoted phrase: "OF(+)" 
> returns  every single document containing the preposition "of".  The
> highlighter clearly shows that "OF(+)" was still not not found as the
> "(+)"  part was not highlighted.
> Is there an easy solution, or must I analyze the user's input and decide
> what to use: IndexSearcher for non quoted queries and
> TermQuery/PhraseQuery for quoted, or must I create some special regex
> rules for words containing non-letters? There are many of these in
> biomedical field.

You can use the RegexTokenizer to define how your documents are split into tokens:

To handle the use case described above, you could for example add parens and the plus sign
to the list of word characters. So your pattern would look something like '[\w()+]+'. But
this would match parens everywhere which is probably not what you want. Another approach is
to split on parens and create tokens for sequences of plus signs resulting in a pattern like


View raw message