lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: contrib/surround
Date Sun, 05 Jun 2005 09:07:31 GMT
How about putting this here:

http://wiki.apache.org/general/SummerOfCode2005

It seems to be a nice fit for the sponsor.

Regards,
Paul Elschot


On Saturday 04 June 2005 22:25, Paul Elschot wrote:
> On Monday 30 May 2005 02:44, Erik Hatcher wrote:
> > I concur with Daniel on this.  For the moment, my preference is to  
> > bring in Paul's parser into contrib/surround and let it gain some  
> > additional exposure there.  I don't believe its possible or even  
> > preferable to attempt to build one query parser to rule them all.   
> > While a decent general purpose one is handy, I'm finding that my  
> > projects really demand more custom parsing capabilities than the  
> > built-in QueryParser can handle and that the quirks of the current  
> > parser cause some frustrations sometimes.
> > 
> > Perhaps over time, the built-in QueryParser can adopt some additional  
> > capabilities such as supporting the SpanQuery family but let's take  
> > that sort of thing slowly.
> > 
> 
> How about extending the surround parser to allow the use of all
> queries currently in Lucene? The goal would be to allow as many
> queries as possible.
> 
> The queries not available in the current surround parser are:
> - FuzzyQuery, WildCardQuery, PrefixQuery
> - SpanFirstQuery
> - SpanNotQuery
> - MultiPhraseQuery (or the various phrase scorers),
> - optional terms/clauses
> 
> FuzzyQuery and SpanFirstQuery could be done with a prefix operator
> including a number (like the nn in the nnN near operator) followed by a
> single query, with appropriate restrictions.
> A prefix operator followed by  a single query is currently not present, but 
> relatively easy to add.
> SpanNotQuery always has two subqueries, so would need an infix operator
> only.
> MultiPhraseQuery would need an infix operator and a prefix operator, just
> like the N and W operators, and a restriction to terms, truncations and OR
> as subqueries.
> 
> Left truncation could also be allowed,
> truncations currently have to start with a normal character.
> Truncation might also be left to WildCardQuery and
> PrefixQuery instead of the current "equivalent" in Surround
> that uses regular expressions to find the matching terms.
> 
> That leaves the optional terms/clauses, and I can't think of an easy way to
> handle these. Any ideas? OR does not work for this because it requires
> at least one. The normal QueryParser syntax for this is +aa bb cc,
> where bb and cc are the optional parts.
> 
> Some control over performance is outside the language.
> A basic query factory must be provided to the create a Lucene query
> from a Surround query, and this throws an exception when
> rewriting causes too many terms to be used,
> much like the TooManyClauses for BooleanQuery.
> 
> 
> Regards,
> Paul Elschot
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message