Return-Path: X-Original-To: apmail-thrift-dev-archive@www.apache.org Delivered-To: apmail-thrift-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E249FD8B6 for ; Mon, 12 Nov 2012 23:59:12 +0000 (UTC) Received: (qmail 28936 invoked by uid 500); 12 Nov 2012 23:59:12 -0000 Delivered-To: apmail-thrift-dev-archive@thrift.apache.org Received: (qmail 28895 invoked by uid 500); 12 Nov 2012 23:59:12 -0000 Mailing-List: contact dev-help@thrift.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@thrift.apache.org Delivered-To: mailing list dev@thrift.apache.org Received: (qmail 28886 invoked by uid 99); 12 Nov 2012 23:59:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Nov 2012 23:59:12 +0000 Date: Mon, 12 Nov 2012 23:59:12 +0000 (UTC) From: "Nathan Beyer (JIRA)" To: dev@thrift.apache.org Message-ID: <378818962.104041.1352764752562.JavaMail.jiratomcat@arcas> In-Reply-To: <1796438284.33300.1349993222910.JavaMail.jiratomcat@arcas> Subject: [jira] [Commented] (THRIFT-1727) Ruby-1.9: data loss: "binary" fields are re-encoded MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/THRIFT-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495767#comment-13495767 ] Nathan Beyer commented on THRIFT-1727: -------------------------------------- {quote}This is not precisely the same as enforcing that "binary" fields get only BINARY encoded strings and "string" fields get only non-BINARY encoded strings, but this is due to the peculiarity of the Thrift specification that there is compatibility between these two types of strings at the Thrift specification level. But because there is compatibility between these two types of strings at the Ruby level, it actually fits quite nicely.{quote} This approach would only work when sending data via the Ruby code, not when receiving data via the Ruby code. For example, a Ruby client invocation of a service that returns 'binary' data wouldn't be able to use this check as all of the Ruby String would be BINARY/ASCII-8BIT, so all Thrift strings that are received via the Ruby code would have to attempt to decode the bytes as UTF-8, per the type specification. I think the only valid short-term approach is to try and get the field information into the protocols. Strategically, I think 'binary' needs to be elevated to a base type. There could be some intermediate approaches, such as creating a ByteBuffer type that can be passed around in the generated code and then the protocol classes could use that as an indicator of the 'binary' type. I assume this is what happens in other libraries, like Java. > Ruby-1.9: data loss: "binary" fields are re-encoded > --------------------------------------------------- > > Key: THRIFT-1727 > URL: https://issues.apache.org/jira/browse/THRIFT-1727 > Project: Thrift > Issue Type: Bug > Components: Ruby - Library > Affects Versions: 0.9 > Environment: JRuby 1.6.8 using "--1.9" command line parameter. > Reporter: XB > > When setting a binary field of a Thrift object with some binary data (e.g. a string whose encoding is "ASCII-8BIT") and then serializing this object, the binary data is re-encoded. That is, it is encoded as if it were not a sequence of bytes but a sequence of characters, encoded using the ISO-8859-1 encoding. This assumed ISO-8859-1 sequence of characters is then converted into UTF-8 (by BinaryProtocol or CompactProtocol). This basically means that all bytes whose values are between 0x80 (inclusive) and 0x100 (exclusive) are converted into multi-byte sequences. This leads to data corruption. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira