lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Spencer <dave-lucene-u...@tropo.com>
Subject extensible query parser - Re: Proximity Searches behavior
Date Wed, 09 Jun 2004 20:39:47 GMT
Erik Hatcher wrote:

> On Jun 9, 2004, at 12:21 PM, David Spencer wrote:
>
>>> show us that most folks query with 1 - 3 words and do not use the 
>>> any of the advanced features.
>>
>>
>>
>> But with automagic query expansion these things might be done behind 
>> the scenes.  Nutch, for one, expands simple queries to check against 
>> multiple fields, with different boosts, and even gives a bonus for 
>> terms that are near each other.
>
>
> Ah yes!  Don't worry, I hadn't forgotten about Nutch.  I'm tinkering 
> with its query parsing and analysis as we speak in fact.  Very clever 
> indeed.
>
>>> The elegance of the query syntax is quite important, and QueryParser 
>>> has gotten a bit hairy.  I would enjoy discussions on creating new 
>>> query parsers (one size doesn't fit all, I don't think) and what syntax
>>
>>
>>
>> I suggested in some email a while ago making the QueryParser 
>> extensible at, runtime or startup time, so you can add other types if 
>> queries that it doesn't support - so you have a way of registering 
>> these other query types (SpanQuery, SubstringQuery etc) and then some 
>> syntax like "span:foo" to invoke the query expander registered w/ 
>> "span" on "foo"...
>
>
> I would be curious to see how an implementation of this played out.  
> For example, could I add my own syntax such that
>
>     "some phrase" <-3-> "another phrase"
>
> could be parsed into a SpanNearQuery of two SpanNearQuery's?
>
> I like the idea of a flexible run-time grammar, but it sounds too good 
> to be true in a general purpose kinda way.

My idea isn't perfect for humans, but at least lets you use queries not 
hard coded.

You have something like

[1] how you register, could be in existing QueryParser

void register( String name,  SubqueryParser qp)

[2] what you register

interface SubQueryParser
{
Query parse( String s); // parses string user enters, forms a Query...
}

[3] example of registration

register( "substring", new SubstringQP());  // instead of prefix matches 
allows term anywhere
register( "span", new SurroundQP());
register( "syn", new SynonymExpanderQP()); // expands a word to include 
synonyms

[4]  syntax

normal query parser syntax but add something else like "NAME::TEXT" 
(note 2 colons) so

this:          "black syn::bird"

expands to calls in the new extensible query parser,  something like

BooleanQuery bq = ...
bq.add( new TermQuery( "contents", "black"))
bq.add( SubstringParser.parse( "bird")) // really SynonymExpanderQP
return bq

behind the scenes SynonymExpanderQP expanded "bird" to the query 
equivalent of, um, "bird avian^.5 wingedanimal^.5" or whatnot.

[5] the point

Be backward  compatible and "natural" for existing query syntax, but 
leave a hook so that if you innovate and define new query expansion code 
there's some hope of someone using it as they can in theory drop it in 
and use it w/o coding. Right now if you create some code in this area I 
suspect there's little chance people will try it out as there's too much 
friction to try it out.















>
>     Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message