chemistry-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig <mic...@gmail.com>
Subject Re: [jira] [Assigned] (CMIS-344) Query parser should not use UTF-8 encoding
Date Thu, 31 Mar 2011 12:43:29 GMT

> Note though that SELECT * FROM cmis:document WHERE CONTAINS
> ('\u4E2D\u6587') isn't actually legal CMISQL, as currently CMISQL has
> no notion of Unicode escaping. The query would have to contain actual
> Unicode characters.

But doesn't this query contain actual Unicode characters? \u4E2D and 
\u6587 are Java Unicode Escapes [1].

Michael
[1] 
http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc.html#100850

> NB: Unicode escaping is only specified in SQL-2008, not SQL-92. See
> this for a summary:
> http://hsqldb.org/doc/2.0/guide/dataaccess-chapt.html#N11E65
>
> Florent
>
> On Thu, Mar 31, 2011 at 2:00 PM, Florent Guillaume<fg@nuxeo.com>  wrote:
>> No objection, I probably wasn't aware of ANTLRStringStream when I
>> wrote that code.
>>
>> Florent
>>
>> On Thu, Mar 31, 2011 at 12:47 PM, Jens Hübel<jhuebel@opentext.com>  wrote:
>>> Florent,
>>>
>>> as far as I remember this code came originally from your side. Would you have
any objections to apply the proposed patch? Would this break something on your side?
>>>
>>> Jens
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Jens Hübel (JIRA) [mailto:jira@apache.org]
>>> Sent: Donnerstag, 31. März 2011 12:42
>>> To: dev@chemistry.apache.org
>>> Subject: [jira] [Assigned] (CMIS-344) Query parser should not use UTF-8 encoding
>>>
>>>
>>>      [ https://issues.apache.org/jira/browse/CMIS-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
>>>
>>> Jens Hübel reassigned CMIS-344:
>>> -------------------------------
>>>
>>>     Assignee: Jens Hübel
>>>
>>>> Query parser should not use UTF-8 encoding
>>>> ------------------------------------------
>>>>
>>>>                  Key: CMIS-344
>>>>                  URL: https://issues.apache.org/jira/browse/CMIS-344
>>>>              Project: Chemistry
>>>>           Issue Type: Bug
>>>>           Components: opencmis-server
>>>>     Affects Versions: OpenCMIS 0.4.0
>>>>             Reporter: Michael Dürig
>>>>             Assignee: Jens Hübel
>>>>          Attachments: CMIS-344.patch
>>>>
>>>>
>>>> QueryUtil converts the query statement to a UTF-8 encoded byte array which
is used as input to the lexer instead of using the string directly.
>>>> Instead of
>>>>      CharStream input = new ANTLRInputStream(new ByteArrayInputStream(statement.getBytes("UTF-8")));
>>>> the input stream should be obtained like this:
>>>>      CharStream input = new ANTLRStringStream(statement);
>>>> The former method transforms the characters in the contains clause of the
query
>>>>      SELECT * FROM cmis:document WHERE CONTAINS ('\u4E2D\u6587')
>>>> in an incorrect way.
>>>
>>> --
>>> This message is automatically generated by JIRA.
>>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>>>
>>
>>
>>
>> --
>> Florent Guillaume, Director of R&D, Nuxeo
>> Open Source, Java EE based, Enterprise Content Management (ECM)
>> http://www.nuxeo.com   http://www.nuxeo.org   +33 1 40 33 79 87
>>
>
>
>


Mime
View raw message