Return-Path: X-Original-To: apmail-avro-dev-archive@www.apache.org Delivered-To: apmail-avro-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BAF8318798 for ; Thu, 21 Jan 2016 15:28:40 +0000 (UTC) Received: (qmail 86478 invoked by uid 500); 21 Jan 2016 15:28:40 -0000 Delivered-To: apmail-avro-dev-archive@avro.apache.org Received: (qmail 86411 invoked by uid 500); 21 Jan 2016 15:28:40 -0000 Mailing-List: contact dev-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@avro.apache.org Delivered-To: mailing list dev@avro.apache.org Received: (qmail 86101 invoked by uid 99); 21 Jan 2016 15:28:40 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Jan 2016 15:28:40 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id F407D2C1F6E for ; Thu, 21 Jan 2016 15:28:39 +0000 (UTC) Date: Thu, 21 Jan 2016 15:28:39 +0000 (UTC) From: "ASF subversion and git services (JIRA)" To: dev@avro.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (AVRO-1783) Gracefully handle strings with wrong character encoding MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/AVRO-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110769#comment-15110769 ] ASF subversion and git services commented on AVRO-1783: ------------------------------------------------------- Commit 1725988 from [~martinkl] in branch 'avro/trunk' [ https://svn.apache.org/r1725988 ] AVRO-1783. Ruby: Ensure correct binary encoding for byte strings. > Gracefully handle strings with wrong character encoding > ------------------------------------------------------- > > Key: AVRO-1783 > URL: https://issues.apache.org/jira/browse/AVRO-1783 > Project: Avro > Issue Type: Bug > Components: ruby > Affects Versions: 1.7.7 > Reporter: Martin Kleppmann > Assignee: Martin Kleppmann > Attachments: AVRO-1783-2.patch, AVRO-1783.patch, AVRO-1783.stack.text > > > In the [vote thread for Avro 1.8.0-rc2|http://mail-archives.apache.org/mod_mbox/avro-dev/201601.mbox/%3CCAGHyZ6K-oe35%2BOYROK6MSwrHxfPHvjmqhJAfRJL2dzexYw6YSw%40mail.gmail.com%3E], [~busbey] noticed that [phunt's avro-rpc-quickstart|https://github.com/phunt/avro-rpc-quickstart] fails: > {code} > busbey$ ruby sample_ipc_client.rb avro_user pat Hello_World > Avro::IO::AvroTypeError: The datum > "\x89\xA9\xD1\xFF@NUm\xEA\x9A\xFB\xDAx\xF5Zq" > is not an example of schema > {"type":"fixed","name":"MD5","namespace":"org.apache.avro.ipc","size":16} > write_data at > /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:543 > write_record at > /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:610 > each at org/jruby/RubyArray.java:1613 > write_record at > /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:609 > write_data at > /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:561 > write at > /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:538 > write_handshake_request at > /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:136 > request at > /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:105 > request at > /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:117 > (root) at sample_ipc_client.rb:49 > {code} > I tried reproducing the error, and it is quite strange. avro-rpc-quickstart works fine for me in Ruby (MRI) 2.2 and 2.1, and in JRuby 1.7.23. However, [~busbey] was using JRuby 1.7.3 (as visible from the path names above), and in this particular version of JRuby I was able to reproduce the issue. > It seems that in some circumstances (but not always, bizarrely), JRuby 1.7.3 returns a UTF-8 encoded string from {{Digest::MD5.digest}}, rather than a binary-encoded string. {{Schema.validate}} checks that the string is suitable for writing as datum for a {{fixed}} type by calling {{#size}}. In this case, although the MD5 digest of the schema is a 16-byte string, if you interpret it as a UTF-8 encoded string, it consists of only 13 characters (i.e. some sequences are interpreted as multibyte characters). > Rather than trying to divine why JRuby is being weird here, I think this is an opportunity to fix Avro's handling of strings to make it robust against unexpected encodings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)