avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Coates (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (AVRO-2078) Avro does not enforce schema resolution rules for Decimal type
Date Tue, 26 Sep 2017 15:45:00 GMT

    [ https://issues.apache.org/jira/browse/AVRO-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16180950#comment-16180950
] 

Andy Coates edited comment on AVRO-2078 at 9/26/17 3:44 PM:
------------------------------------------------------------

This is particularly nasty bug as it can easily lead to data corruption.  If you write decimal
"1.2345" with a write schema with a scale of 4 and then deserialize with a scale of 3, the
value comes out as "12.345"!!!!


{code:java}
@Test
    public void shouldThrowIfExistingFieldChangesType() throws Exception {
        GenericData genericData = new GenericData();
        genericData.addLogicalTypeConversion(new Conversions.DecimalConversion());

        final Schema v1 = Schema.createRecord("thing", "", "namespace", false, ImmutableList.of(
                new Schema.Field("decimal", LogicalTypes.decimal(3, 3).addToSchema(Schema.create(Schema.Type.BYTES)),
"", Schema.NULL_VALUE)
        ));

        final Schema v2 = Schema.createRecord("thing", "", "namespace", false, ImmutableList.of(
                new Schema.Field("decimal", LogicalTypes.decimal(6, 4).addToSchema(Schema.create(Schema.Type.BYTES)),
"", Schema.NULL_VALUE)
        ));

        final GenericData.Record recordV2 = new GenericData.Record(v2);
        recordV2.put("decimal", new BigDecimal("1.2345"));

        ByteBuffer bytes = serialize(genericData, recordV2);

        final GenericRecord deserialized = deserialize(genericData, v1, v2, bytes);
        final Object result = deserialized.get("decimal");

        // Below fails because result is 'new BigDecimal("12.345")'
        assertThat(result, is (new BigDecimal("1.2345")));
    }

    private ByteBuffer serialize(final GenericData genericData, final GenericData.Record recordV2)
throws java.io.IOException {
        ByteBufferOutputStream output = new ByteBufferOutputStream();
        BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(output, null);
        DatumWriter<IndexedRecord> datumWriter = genericData.createDatumWriter(recordV2.getSchema());
        datumWriter.write(recordV2, encoder);
        encoder.flush();

        return output.getBufferList().get(0);
    }

    private GenericRecord deserialize(final GenericData genericData, final Schema v1, final
Schema v2, final ByteBuffer bytes) throws java.io.IOException {
        ByteBufferInputStream input = new ByteBufferInputStream(bytes);
        final DatumReader<GenericRecord> datumReader = genericData.createDatumReader(v2,
v1);
        return datumReader.read(new GenericData.Record(v1), DecoderFactory.get().binaryDecoder(input,
null));
    }
{code}



was (Author: bigandy):
This is particularly nasty bug as it can easily lead to data corruption.  If you write decimal
"1.2345" with a write schema with a scale of 4 and then deserialize with a scale of 3, the
value comes out as "12.345"!!!!

> Avro does not enforce schema resolution rules for Decimal type
> --------------------------------------------------------------
>
>                 Key: AVRO-2078
>                 URL: https://issues.apache.org/jira/browse/AVRO-2078
>             Project: Avro
>          Issue Type: Bug
>            Reporter: Anthony Hsu
>            Assignee: Nandor Kollar
>             Fix For: 1.8.2
>
>         Attachments: dec.avro
>
>
> According to http://avro.apache.org/docs/1.8.2/spec.html#Decimal
> bq. For the purposes of schema resolution, two schemas that are {{decimal}} logical types
_match_ if their scales and precisions match.
> This is not enforced.
> I wrote a file with (precision 5, scale 2) and tried to read it with a reader schema
with (precision 3, scale 1). I expected an AvroTypeException to be thrown, but none was thrown.
> Test data file attached. The code to read it is:
> {noformat:title=ReadDecimal.java}
> import java.io.File;
> import org.apache.avro.Schema;
> import org.apache.avro.file.DataFileReader;
> import org.apache.avro.generic.GenericDatumReader;
> import org.apache.avro.generic.GenericRecord;
> import org.apache.avro.io.DatumReader;
> public class ReadDecimal {
>   public static void main(String[] args) throws Exception {
>     Schema schema = new Schema.Parser().parse("{\n" + "  \"type\" : \"record\",\n" +
"  \"name\" : \"some_schema\",\n"
>         + "  \"namespace\" : \"com.howdy\",\n" + "  \"fields\" : [ {\n" + "    \"name\"
: \"name\",\n"
>         + "    \"type\" : \"string\"\n" + "  }, {\n" + "    \"name\" : \"value\",\n"
+ "    \"type\" : {\n"
>         + "      \"type\" : \"bytes\",\n" + "      \"logicalType\" : \"decimal\",\n"
+ "      \"precision\" : 3,\n"
>         + "      \"scale\" : 1\n" + "    }\n" + "  } ]\n" + "}");
>     DatumReader<GenericRecord> datumReader = new GenericDatumReader<>(schema);
>     // dec.avro has precision 5, scale 2
>     DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(
>         new File("/tmp/dec.avro"), datumReader);
>     GenericRecord foo = null;
>     while (dataFileReader.hasNext()) {
>       foo = dataFileReader.next(foo);  // AvroTypeException expected due to change in
scale/precision but none occurs
>     }
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message