avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1402) Support for DECIMAL type
Date Tue, 08 Apr 2014 13:28:20 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962954#comment-13962954
] 

Tom White commented on AVRO-1402:
---------------------------------

There has been some further (offline) discussion about whether it would be possible to store
the scale in the Avro schema, and not in the data for efficiency reasons. Something like:

{code}
{
  "type":"record”,
  "name":”org.apache.avro.FixedDecimal”,
  "fields”: [{
    "name":"value”,
    "type":”bytes"
  }],
  "scale":"2”,
  "precision":”4"
}
{code}

In the implementation committed here the name does not uniquely determine the RecordMapping
so a FixedDecimal(4, 2) has a different RecordMapping to a FixedDecimal(3, 0). GenericData
has a map of name to RecordMappings, so org.apache.avro.FixedDecimal would map to either FixedDecimalRecordMapping(4,
2) or FixedDecimalRecordMapping(3, 0), but not both.

We could solve this problem by having a stateless FixedDecimalRecordMapping and having the
read and write methods pass through the record schema to get the scale. However, consider
the case where there are multiple decimals (with different scales) in a single schema. Since
you can’t redefine a type multiple times (http://avro.apache.org/docs/1.7.6/spec.html#Names),
the first one serves as the definition, and later ones are just references:

{code}
{"type":"record","name":"rec","fields":[
  {"name":"dec1","type":{"type":"record","name":"org.apache.avro.FixedDecimal","fields":[{"name":"value","type":"bytes"}],"scale":"2","precision":"4"}},
  {"name":"dec2","type":"org.apache.avro.FixedDecimal","precision":"3","scale":"0"}
]} 
{code}

When GenericDatumReader/Writer is processing dec2, the value of scale seen is 2, not 0, since
the read/write method sees the record schema, not the field-level schema. I can’t see a
simple way around this.

Note that in the Decimal schema committed in this JIRA we allow maxPrecision and maxScale
values to be specified as JSON properties that are not interpreted by Avro. E.g.

{code}
{"type":"record","name":"rec","fields":[
  {"name":"dec1","type":{"type":"record","name":”org.apache.avro.Decimal","fields":[{"name":"scale","type":"int"},{"name":"value","type":"bytes"}],"maxPrecision":"4","maxScale":"2"}},
  {"name":"dec2","type":"org.apache.avro.Decimal","maxPrecision":"3","maxScale":"0"}
]}
{code}

As it stands an application using this extra metadata would have to be careful to read the
JSON properties either from the field (if they are present there) or the org.apache.avro.Decimal
record type. This might be something we improve - e.g. by only having the metadata as a field-level
properties, not as a part of the record definition. That would work for Hive.

> Support for DECIMAL type
> ------------------------
>
>                 Key: AVRO-1402
>                 URL: https://issues.apache.org/jira/browse/AVRO-1402
>             Project: Avro
>          Issue Type: New Feature
>    Affects Versions: 1.7.5
>            Reporter: Mariano Dominguez
>            Assignee: Tom White
>            Priority: Minor
>              Labels: Hive
>             Fix For: 1.7.7
>
>         Attachments: AVRO-1402.patch, AVRO-1402.patch, AVRO-1402.patch, AVRO-1402.patch,
UnixEpochRecordMapping.patch
>
>
> Currently, Avro does not seem to support a DECIMAL type or equivalent.
> http://avro.apache.org/docs/1.7.5/spec.html#schema_primitive
> Adding DECIMAL support would be particularly interesting when converting types from Avro
to Hive, since DECIMAL is already a supported data type in Hive (0.11.0).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message