lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: contrib/surround
Date Sun, 05 Jun 2005 13:22:52 GMT
Paul,

I'm swamped this weekend and all this coming week moving  
(physically).  I would be happy to mentor someone tackling these  
changes.  Could you go ahead and put your ideas on the wiki and list  
me as the ASF mentor? (I know it says ASF members and committers, but  
feel free to add it on my behalf).

     Erik

On Jun 5, 2005, at 5:07 AM, Paul Elschot wrote:

> How about putting this here:
>
> http://wiki.apache.org/general/SummerOfCode2005
>
> It seems to be a nice fit for the sponsor.
>
> Regards,
> Paul Elschot
>
>
> On Saturday 04 June 2005 22:25, Paul Elschot wrote:
>
>> On Monday 30 May 2005 02:44, Erik Hatcher wrote:
>>
>>> I concur with Daniel on this.  For the moment, my preference is to
>>> bring in Paul's parser into contrib/surround and let it gain some
>>> additional exposure there.  I don't believe its possible or even
>>> preferable to attempt to build one query parser to rule them all.
>>> While a decent general purpose one is handy, I'm finding that my
>>> projects really demand more custom parsing capabilities than the
>>> built-in QueryParser can handle and that the quirks of the current
>>> parser cause some frustrations sometimes.
>>>
>>> Perhaps over time, the built-in QueryParser can adopt some  
>>> additional
>>> capabilities such as supporting the SpanQuery family but let's take
>>> that sort of thing slowly.
>>>
>>>
>>
>> How about extending the surround parser to allow the use of all
>> queries currently in Lucene? The goal would be to allow as many
>> queries as possible.
>>
>> The queries not available in the current surround parser are:
>> - FuzzyQuery, WildCardQuery, PrefixQuery
>> - SpanFirstQuery
>> - SpanNotQuery
>> - MultiPhraseQuery (or the various phrase scorers),
>> - optional terms/clauses
>>
>> FuzzyQuery and SpanFirstQuery could be done with a prefix operator
>> including a number (like the nn in the nnN near operator) followed  
>> by a
>> single query, with appropriate restrictions.
>> A prefix operator followed by  a single query is currently not  
>> present, but
>> relatively easy to add.
>> SpanNotQuery always has two subqueries, so would need an infix  
>> operator
>> only.
>> MultiPhraseQuery would need an infix operator and a prefix  
>> operator, just
>> like the N and W operators, and a restriction to terms,  
>> truncations and OR
>> as subqueries.
>>
>> Left truncation could also be allowed,
>> truncations currently have to start with a normal character.
>> Truncation might also be left to WildCardQuery and
>> PrefixQuery instead of the current "equivalent" in Surround
>> that uses regular expressions to find the matching terms.
>>
>> That leaves the optional terms/clauses, and I can't think of an  
>> easy way to
>> handle these. Any ideas? OR does not work for this because it  
>> requires
>> at least one. The normal QueryParser syntax for this is +aa bb cc,
>> where bb and cc are the optional parts.
>>
>> Some control over performance is outside the language.
>> A basic query factory must be provided to the create a Lucene query
>> from a Surround query, and this throws an exception when
>> rewriting causes too many terms to be used,
>> much like the TooManyClauses for BooleanQuery.
>>
>>
>> Regards,
>> Paul Elschot
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message