chemistry-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kiessling, Heiko" <>
Subject Re: Issues with
Date Fri, 23 Sep 2011 21:58:47 GMT
Hi, Jens,

as I said, going with your original solution is also okay with me.


Am 23.09.2011 um 22:53 schrieb "Jens Hübel" <>:

> Heiko, I am not claiming to be the best expert on grammars and ANTLR. It is easy to define
two lexer rules for different string literals as you point out. However in my opinion AntLR
will only extract a LIKE_LITERAL if the String starts with a \% or \_ and in all other cases
extract a STRING_LIT and fail with an error if it later finds an escaped LIKE character in
the string. 
> There is no way to detect a LIKE_LITERAL for any fixed number of lookahead characters.
Please note that AntLR does not do backtracking in case of later errors. You can switch on
an option backtracking=true but this would be very inefficient.
> There are also fancy features like semantic predicates and other things to deal with
ambiguous grammars. However I ran into lots of ugly issues when trying those for text search.
A typical symptom is that it generates Java code that fails to compile with messages like
"method too large". I ended up turning off all these features and looking for another solution.
> Unless somebody has a grammar patch that is simple and works I still tend to stick to
my previous solution.
> Jens
> -----Original Message-----
> From: Kiessling, Heiko [] 
> Sent: Freitag, 23. September 2011 09:52
> To:
> Subject: Re: Issues with
> Hi, Jens,
> thanks for getting back on this.
> If the lexer generator you use has no means to exclude certain character sequences, and
in effect to accept certain literals for which then the parser can check context, I see no
other option. To be more precise: would it be possible for the lexer generator to exclude
"\%" and "\_" from normal string literals and have the lexer accept a LIKE_LIT otherwise.
Of course, then the production rule for LIKE in the parser has to accept both STRING_LIT and
LIKE_LIT on the right side.
> The lexer generator I used a few years ago (REX) could do this, but if it's not possible
here, I +1 for your proposal.
> Cheers
> Heiko
> Am 22.09.2011 um 21:49 schrieb "Jens Hübel" <>:
>> Hi Heiko,
>> again sorry for the long delay in my reply. After a lot of travel I am now able to
look into this issue. You are absolutely right. Backslash escaping for underscore and percent
characters is not support at the moment for LIKE and this is not what the spec says.
>> It is no problem to extend the grammar to support this. However this has a certain
impact. On the lexical level we only can have one kind of string literal and there is no context
whether we are in a LIKE expression or anywhere else. This means backslash escaping for percent
and underscore is then allowed for any kind of string literal. Throwing an exception in all
other cases where we are not in a LIKE expression is then part of the user code and not the
parser framework. The best we can do is provide helper functions for unescaping to make this
a bit easier.
>> If everyone is fine with this approach I will change the lexer grammar.
>> Jens
>> -----Original Message-----
>> From: Kiessling, Heiko [] 
>> Sent: Montag, 12. September 2011 16:45
>> To:
>> Subject: RE: Issues with
>> Hi, Jens,
>> thanks for your quick reply. I got me the snapshot 'chemistry-opencmis-server-support-0.5.0-20110911.030458-142.jar'
>> in the meantime but this has still the problem with the eascaping mechanism. The
WHERE clause I try is
>> 'WHERE cmis:name LIKE 'Do\\%ent''.
>> Thanks and best regards
>> Heiko
>> ----------------
>> You wrote:
>> Hi Heiko,
>> are you using the latest snapshot from SVN? Since the last release there are several
>> and enhancements to the escaping mechanism. Please use the latest version from the
trunk if
>> you don't have it and let me know if this still does not work as expected. (A new
>> will be available soon).
>> There is no kind of semantic analysis in the framework. It is just the parser and
any error
>> handling except basic syntax errors is up to you.
>> Hope this helps....
>> Jens
>> -----Original Message-----
>> From: Kiessling, Heiko []
>> Sent: Mittwoch, 7. September 2011 18:23
>> To:
>> Subject: Issues with
>> Hi,
>> in the cause of implementing CMIS queries I have found the following problems with
the above
>> method:
>> -       The parser does not accept escaping backslashes in LIKE strings. For example,
>> string 'pa\%ern' which according to the CMIS spec is supposed to look for the value
>> is acknowledged with the two messages "mismatched character '%' expecting set null"
and "mismatched
>> character '<EOF>' expecting '''" and a CmisInvalidArgumentException. Sounds
like a lexical
>> analysis problem to me.
>> -       Is there semantic analysis built in? For example, the = ANY operator is not
>> for single-valued properties, and, vice versa the simple = operator is not allowed
for multi-valued
>> properties. However, no error is announced when parsing this kind of statement.
>> Would be great if you could tell us whether these are known limitations at this time
but are
>> worked on, or whether we're making any mistakes.
>> Thanks and best regards
>> Heiko Kiessling
>> Senior Developer
>> TIP CORE Conn., Security, Integr. (AG)
>> SAP AG | Dietmar-Hopp-Allee 16 | 69190 Walldorf, Germany
>> T + 49 6227 745434 | F + 49 6227 7822615
>> E<> |<>
>> Pflichtangaben/Mandatory Disclosure Statements:
>> Diese E-Mail kann Betriebs- oder Geschäftsgeheimnisse oder sonstige vertrauliche
>> enthalten.
>> Sollten Sie diese E-Mail irrtümlich erhalten haben, ist Ihnen eine Kenntnisnahme
des Inhalts,
>> eine Vervielfältigung
>> oder Weitergabe der E-Mail ausdrücklich untersagt. Bitte benachrichtigen Sie uns
und vernichten
>> Sie die
>> empfangene E-Mail. Vielen Dank.
>> This e-mail may contain trade secrets or privileged, undisclosed, or otherwise confidential
>> information. If you have
>> received this e-mail in error, you are hereby notified that any review, copying,
or distribution
>> of it is strictly prohibited.
>> Please inform us immediately and destroy the original transmittal. Thank you for
your cooperation.
View raw message