Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-dev@lucene.apache.org
Message-ID: <1754577516.1258662220406.JavaMail.jira@brutus>
Date: Thu, 19 Nov 2009 20:23:40 +0000 (UTC)
From: "David Kaelbling (JIRA)" <jira@apache.org>
To: java-dev@lucene.apache.org
Subject: [jira] Issue Comment Edited: (LUCENE-2039) Regex support and beyond
 in JavaCC QueryParser
In-Reply-To: <1096156079.1257532832431.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/LUCENE-2039?page=3Dcom.atlassia=
n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D127=
80193#action_12780193 ]=20

David Kaelbling edited comment on LUCENE-2039 at 11/19/09 8:22 PM:
-------------------------------------------------------------------

I apologize if I haven't read the comments carefully enough, but in LUCENE-=
2039_field_ext.patch why is ExtendableQueryParser final?  That means (for e=
xample) that ComplexPhraseQueryParser cannot subclass it.  In the earlier L=
UCENE-2039.patch the complex phrase parser picked up the changes for free.

And would RegexParserExtension maybe be easier to use if it set the RegexCa=
pabilities on the new RegexQuery it is returning?


      was (Author: dkaelbling@blackducksoftware.com):
    I apologize if I haven't read the comments carefully enough, but in LUC=
ENE-2039_field_ext.patch why is ExtendableQueryParser final?  That means (f=
or example) that ComplexPhraseQueryParser cannot subclass it.  In the earli=
er LUCENE-2039.patch the complex phrase parser picked up the changes for fr=
ee.

 =20
> Regex support and beyond in JavaCC QueryParser
> ----------------------------------------------
>
>                 Key: LUCENE-2039
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2039
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>            Reporter: Simon Willnauer
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-2039.patch, LUCENE-2039_field_ext.patch
>
>
> Since the early days the standard query parser was limited to the queries=
 living in core, adding other queries or extending the parser in any way al=
ways forced people to change the grammar file and regenerate. Even if you c=
hange the grammar you have to be extremely careful how you modify the parse=
r so that other parts of the standard parser are affected by customisation =
changes. Eventually you had to live with all the limitation the current par=
ser has like tokenizing on whitespaces before a tokenizer / analyzer has th=
e chance to look at the tokens.=20
> I was thinking about how to overcome the limitation and add regex support=
 to the query parser without introducing any dependency to core. I added a =
new special character that basically prevents the parser from interpreting =
any of the characters enclosed in the new special characters. I choose the =
forward slash  '/' as the delimiter so that everything in between two forwa=
rd slashes is basically escaped and ignored by the parser. All chars embedd=
ed within forward slashes are treated as one token even if it contains othe=
r special chars like * []?{} or whitespaces. This token is subsequently pas=
sed to a pluggable "parser extension" with builds a query from the embedded=
 string. I do not interpret the embedded string in any way but leave all th=
e subsequent work to the parser extension. Such an extension could be anoth=
er full featured query parser itself or simply a ctor call for regex query.=
 The interface remains quiet simple but makes the parser extendible in an e=
asy way compared to modifying the javaCC sources.
> The downsides of this patch is clearly that I introduce a new special cha=
r into the syntax but I guess that would not be that much of a deal as it i=
s reflected in the escape method though. It would truly be nice to have mor=
e than once extension an have this even more flexible so treat this patch a=
s a kickoff though.
> Another way of solving the problem with RegexQuery would be to move the J=
DK version of regex into the core and simply have another method like:
> {code}
> protected Query newRegexQuery(Term t) {
>   ...=20
> }
> {code}
> which I would like better as it would be more consistent with the idea of=
 the query parser to be a very strict and defined parser.
> I will upload a patch in a second which implements the extension based ap=
proach I guess I will add a second patch with regex in core soon too.

--=20
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org