camel-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Franz Forsthofer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CAMEL-8905) encoding problems in jsonpath
Date Fri, 17 Jul 2015 09:14:05 GMT

    [ https://issues.apache.org/jira/browse/CAMEL-8905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14631073#comment-14631073
] 

Franz Forsthofer commented on CAMEL-8905:
-----------------------------------------

jasonpath uses by default the jsonsmart parser. I also think that we could use the proposed
patch as interim solution until all parsers work with encoding detection. 

Concerning the stream caching: The stream wrapper does not effect the stream caching functionality
because the stream wrapper is never returned as body of the Camel message. At least not in
the proposed solution. Of course, one could think of other use cases for the stream wrapper
where it is set to the message body. For example you could write a processor whose only purpose
it is to detect the encoding of a JSON document and you could return the stream wrapper in
the body. In this case one has to test whether stream caching works. My feeling is that it
will work. However, one has to test this. But as I said, currently this is not the use case
for the stream wrapper.

So, shall I go on and commit the change?

> encoding problems in jsonpath
> -----------------------------
>
>                 Key: CAMEL-8905
>                 URL: https://issues.apache.org/jira/browse/CAMEL-8905
>             Project: Camel
>          Issue Type: Bug
>          Components: camel-jsonpath
>    Affects Versions: 2.15.2
>            Reporter: Franz Forsthofer
>             Fix For: 2.16.0, 2.15.3
>
>         Attachments: 0001-jsonpath-automatic-charset-detection.patch, booksUTF16BE.json,
booksUTF16LE.json, jsonUCS2BigEndianWithBOM.txt, jsonUCS2BigEndianWithoutBOM.txt, jsonUCS2LittleEndianWithBom.txt,
jsonUCS2LittleEndianWithoutBOM.txt, jsonUTF32BEWithBOM.txt, jsonUTF32BEWithoutBOM.txt, jsonUTF32LEWithBOM.txt,
jsonUTF32LEWithoutBOM.txt
>
>
> I detected three different encoding problems in jsonpath:
> - if jsonpath is called with an input stream which has an encoding different from the
default encoding (given by Charset.defaultCharset()) then jsonpath still uses the default
encoding. Error location in JsonPathEngine:
>         else if (json instanceof InputStream) {
>             InputStream is = (InputStream) json;
>             return path.read(is, Charset.defaultCharset().displayName(), 
> configuration);}
>       
> - if jsonpath is called with a json file whose encoding is different from UTF-8, then
jsonpath still parses the document with UTF-8. Error location in JsonPathEngine:
>        else if (json instanceof File) {
>             File file = (File) json;
>             return path.read(file, configuration);
>        }
>  path.read(file, configuration) uses always UTF-8
> - if jsonpath is called with an URL pointing to a JSON document whose encoding is different
from UTF-8, then jsonPath still parses the document with UTF-8. Error location in JsonPathEngine:
>          else if (json instanceof URL) {
>             URL url = (URL) json;
>             return path.read(url, configuration);
>          }
> path.read(url, configuration) uses UTF-8
> My solution proposal is to determine the encoding of the JSON documents automatically
according to the specification RFC-4627 (https://www.ietf.org/rfc/rfc4627.txt; see chapter
3. Encoding) and then call the method path.read(jsonDocument,foundEncoding,configuration)
with the found encoding. See attached patch.
> Actually I can commit the patch myself. However, I would like that somebody who is more
familiar with jsonpath than I does review my patch.
> So please tell me if my patch can be accepted or not. I can then do the actual commit
or I will discard the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message