lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <>
Subject New Lucene QueryParser
Date Wed, 06 Dec 2006 01:21:57 GMT
I have finally delved back into the Lucene Query parser that I started a 
few months back. I am very closing to wrapping up it's initial 
development. I am currently looking for anybody willing to help me out 
with a little testing and maybe some design consultation (I am not happy 
with the current range query  syntax for one). If you have any 
interested  in using this parser and have a little time to help out, 
please do. The parser is extremely customizable and you can basically 
mold it into whatever you want. A brief outline of the feature set:

The basics from Lucene query parser are covered: escaping operators, 
handling tokens at the same position, range queries, etc.

Default Operators are: & | ! ~ ( )
New operators can be defined and default operators can be hidden on the fly.

Adds a proximity operator to the standard AND, OR, and ANDNOT operators 
allowing for queries like:
(search bear) ~5 (snake & horse ~4 pope) | crazy query

The default space operator is customizable and can be made to bind 
tighter than if you use the actual operator (the operator acts like the 
actual operator but within parenthesis).

The order of operations for the operators is customizable. The default 
order is |, &, ~, !, ( ) can change it to whatever you want.

Query-time thesaurus expansion / General token to query expansion : 
Takes advantage of a general find/replace feature, "expand" might map to 
"(expander | expanded)" ... or any other valid syntax. There is also a 
slower RegEx feature so that you can match tokens with a Pattern and 
perform back reference enabled replacements. You can also make the 
replacement behave as an might map NEAR to ~10 , creating 
a new operator that performs within 10 word proximity searches.

Did You Mean feature using the SpellCheck contrib: if you search for 
'date(Aug 3, 1952) & mackine | rabbit' you might get a suggestion of : 
'date(Aug 3, 1952) & machine | rabbit'

Paragraph/Sentence proximity search functionality. You can inject tokens 
to specify paragraph and sentence markers and perform SpanNotWithin 
searches for paragraph sentence proximity searches.

Customizable date parser.

Everything is pretty much configurable on the fly.

Note that there may be some limitations...but so far this has proved to 
be pretty powerful. I could sure use some testing help making it 
production ready though. I will be putting a new website up for the 
parser soon. Please send me a note if you can help out at all. When I 
put up the jar you can just run it with Java -jar and it will provide a 
console input to enter queries and see the Lucene Query generated.

- Mark Miller

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message