avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "BELUGA BEHR (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AVRO-2048) Avro Binary Decoding - Gracefully Handle Long Strings
Date Fri, 14 Jul 2017 21:21:00 GMT
BELUGA BEHR created AVRO-2048:
---------------------------------

             Summary: Avro Binary Decoding - Gracefully Handle Long Strings
                 Key: AVRO-2048
                 URL: https://issues.apache.org/jira/browse/AVRO-2048
             Project: Avro
          Issue Type: Improvement
          Components: java
    Affects Versions: 1.8.2, 1.7.7
            Reporter: BELUGA BEHR
            Priority: Minor


According to the [specs|https://avro.apache.org/docs/1.8.2/spec.html#binary_encode_primitive]:

bq. a string is encoded as a *long* followed by that many bytes of UTF-8 encoded character
data.

However, that is currently not being adhered to:

{code:title=org.apache.avro.io.BinaryDecoder}
  @Override
  public Utf8 readString(Utf8 old) throws IOException {
    int length = readInt();
    Utf8 result = (old != null ? old : new Utf8());
    result.setByteLength(length);
    if (0 != length) {
      doReadBytes(result.getBytes(), 0, length);
    }
    return result;
  }
{code}

The first thing the code does here is to load an *int* value, not a *long*.  Because of the
variable length nature of the size, this will mostly work.  However, there may be edge-cases
where this is broken and the serializer is putting in large values erroneously or nefariously.
Let us gracefully handle to detect such scenarios and more closely adhere to the spec.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message