chemistry-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kiessling, Heiko" <>
Subject RE: Issues with
Date Thu, 29 Sep 2011 14:25:37 GMT
Hi, Jens,

I think I've found another small parser problem: it doesn't like qualified asterisks in the
SELECT clause, that is something like "SELECT cmis:document.* FROM cmis:document".


-----Original Message-----
From: Jens Hübel [] 
Sent: Freitag, 23. September 2011 22:53
Subject: RE: Issues with

Heiko, I am not claiming to be the best expert on grammars and ANTLR. It is easy to define
two lexer rules for different string literals as you point out. However in my opinion AntLR
will only extract a LIKE_LITERAL if the String starts with a \% or \_ and in all other cases
extract a STRING_LIT and fail with an error if it later finds an escaped LIKE character in
the string. 

There is no way to detect a LIKE_LITERAL for any fixed number of lookahead characters. Please
note that AntLR does not do backtracking in case of later errors. You can switch on an option
backtracking=true but this would be very inefficient.

There are also fancy features like semantic predicates and other things to deal with ambiguous
grammars. However I ran into lots of ugly issues when trying those for text search. A typical
symptom is that it generates Java code that fails to compile with messages like "method too
large". I ended up turning off all these features and looking for another solution.

Unless somebody has a grammar patch that is simple and works I still tend to stick to my previous


-----Original Message-----
From: Kiessling, Heiko [] 
Sent: Freitag, 23. September 2011 09:52
Subject: Re: Issues with

Hi, Jens,

thanks for getting back on this.

If the lexer generator you use has no means to exclude certain character sequences, and in
effect to accept certain literals for which then the parser can check context, I see no other
option. To be more precise: would it be possible for the lexer generator to exclude "\%" and
"\_" from normal string literals and have the lexer accept a LIKE_LIT otherwise. Of course,
then the production rule for LIKE in the parser has to accept both STRING_LIT and LIKE_LIT
on the right side.

The lexer generator I used a few years ago (REX) could do this, but if it's not possible here,
I +1 for your proposal.


Am 22.09.2011 um 21:49 schrieb "Jens Hübel" <>:

> Hi Heiko,
> again sorry for the long delay in my reply. After a lot of travel I am now able to look
into this issue. You are absolutely right. Backslash escaping for underscore and percent characters
is not support at the moment for LIKE and this is not what the spec says.
> It is no problem to extend the grammar to support this. However this has a certain impact.
On the lexical level we only can have one kind of string literal and there is no context whether
we are in a LIKE expression or anywhere else. This means backslash escaping for percent and
underscore is then allowed for any kind of string literal. Throwing an exception in all other
cases where we are not in a LIKE expression is then part of the user code and not the parser
framework. The best we can do is provide helper functions for unescaping to make this a bit
> If everyone is fine with this approach I will change the lexer grammar.
> Jens
> -----Original Message-----
> From: Kiessling, Heiko [] 
> Sent: Montag, 12. September 2011 16:45
> To:
> Subject: RE: Issues with
> Hi, Jens,
> thanks for your quick reply. I got me the snapshot 'chemistry-opencmis-server-support-0.5.0-20110911.030458-142.jar'
> in the meantime but this has still the problem with the eascaping mechanism. The WHERE
clause I try is
> 'WHERE cmis:name LIKE 'Do\\%ent''.
> Thanks and best regards
> Heiko
> ----------------
> You wrote:
> Hi Heiko,
> are you using the latest snapshot from SVN? Since the last release there are several
> and enhancements to the escaping mechanism. Please use the latest version from the trunk
> you don't have it and let me know if this still does not work as expected. (A new release
> will be available soon).
> There is no kind of semantic analysis in the framework. It is just the parser and any
> handling except basic syntax errors is up to you.
> Hope this helps....
> Jens
> -----Original Message-----
> From: Kiessling, Heiko []
> Sent: Mittwoch, 7. September 2011 18:23
> To:
> Subject: Issues with
> Hi,
> in the cause of implementing CMIS queries I have found the following problems with the
> method:
> -       The parser does not accept escaping backslashes in LIKE strings. For example,
> string 'pa\%ern' which according to the CMIS spec is supposed to look for the value 'pa%ern'
> is acknowledged with the two messages "mismatched character '%' expecting set null" and
> character '<EOF>' expecting '''" and a CmisInvalidArgumentException. Sounds like
a lexical
> analysis problem to me.
> -       Is there semantic analysis built in? For example, the = ANY operator is not possible
> for single-valued properties, and, vice versa the simple = operator is not allowed for
> properties. However, no error is announced when parsing this kind of statement.
> Would be great if you could tell us whether these are known limitations at this time
but are
> worked on, or whether we're making any mistakes.
> Thanks and best regards
> Heiko Kiessling
> Senior Developer
> TIP CORE Conn., Security, Integr. (AG)
> SAP AG | Dietmar-Hopp-Allee 16 | 69190 Walldorf, Germany
> T + 49 6227 745434 | F + 49 6227 7822615
> E<> |<>
> Pflichtangaben/Mandatory Disclosure Statements:
> Diese E-Mail kann Betriebs- oder Geschäftsgeheimnisse oder sonstige vertrauliche Informationen
> enthalten.
> Sollten Sie diese E-Mail irrtümlich erhalten haben, ist Ihnen eine Kenntnisnahme des
> eine Vervielfältigung
> oder Weitergabe der E-Mail ausdrücklich untersagt. Bitte benachrichtigen Sie uns und
> Sie die
> empfangene E-Mail. Vielen Dank.
> This e-mail may contain trade secrets or privileged, undisclosed, or otherwise confidential
> information. If you have
> received this e-mail in error, you are hereby notified that any review, copying, or distribution
> of it is strictly prohibited.
> Please inform us immediately and destroy the original transmittal. Thank you for your

View raw message