lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <>
Subject Re: Proximity Query Parser
Date Fri, 01 Sep 2006 10:54:29 GMT
Paul Elschot wrote:
> Mark,
> On Thursday 31 August 2006 23:18, Mark Miller wrote:
>> I am not a huge fan of the queryparser's syntax so I have started an 
>> open source project to create a viable alternative. I could really use 
>> some helping testing it out. The more I can get it tested the better 
>> chance it has of serving the community. The parser is called Qsol. I am 
>> right up against its initial release. So far it:
>> offers a simple clean syntax.
>> allows arbitrary combinations/nesting of proximity and boolean queries.
> Could you say in a few words how the combination of proximity and boolean
> is implemented in Qsol?
> I found this the most difficult thing to implement in surround. In surround, 
> every subquery that can be a proximity subquery has two (groups of) methods: 
> one for use as boolean and one for use as proximity.
> I'd like to have a mechanism that allows mixing proximity and boolean queries 
> built into Lucene.
> Did you also implement parsed phrases with Lucene's PhraseQuery?
> Surround does not have that.
> Regards,
> Paul Elschot
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:
Hi Paul,

I'm afraid my programming is prob quite a ways behind yours so I doubt 
anything I have done will be of any help to you.

I also have to treat things differently depending on if I am in a 
proximity clause or boolean clause. A wildcard in a boolean is mapped to 
a wildcard query. A wildcard in a proximity is mapped to a regex span 
that has been modified to only deal with * and ?. When I run into a 
proximity, I collect a small tree of each clause and distribute them 
against each other...(old | map) ~3 big gets distributed to old ~3 big | 
map ~3 big. This distribution method appears to handle all 
boolean/proximity nesting/mixing cases for me, including: great ! "big 
old phrase search" ~5 (holy ~4 (big black bear)). The distribution 
maintains order of operations, but also obviously can create some pretty 
large queries.

I did not use the phrase search because I do not like how the slop works 
(not in order, etc.) so both in and out of proximity uses a nearspan 
instead. For a multiphrase search I use an OrSpan on words in the same 

- Mark

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message