avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Carey (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1325) Enhanced Schema Builder API
Date Tue, 07 May 2013 20:51:15 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13651288#comment-13651288
] 

Scott Carey commented on AVRO-1325:
-----------------------------------

Below are the limitations that concern me from AVRO-1274, in approximate priority of my concern.

# Arbitrary properties are not supported, for example {"type":"string", "avro.java.string":"String"}
can not be built.
# SchemaBuilder.INT and other constants are public.  Unfortunately, these are mutable, and
anyone could call addProp() on these, affecting others.
# Scopes are confusing, it is not always obvious when a 
# Does not chain to nested types.  Although there is limited chaining for record fields, nested
calls to the builder are required which prevents supporting namespace nesting or other passing
of context from outer to inner scopes.


I have a prototype patch that builds on the work in AVRO-1274.  The major changes are to how
scopes are handled for fields and unions, since adding property support is not trivial on
top of AVRO-1274 because there is much ambiguity in what a call to add a property would apply
to (the field, or the type of the field?)

The following schema:
{code:json}
  {"type":"record","name":"HandshakeRequest","namespace":"org.apache.avro.ipc","fields":[
    {"name":"clientHash","type":{"type":"fixed","name":"MD5","size":16}},
    {"name":"clientProtocol","type":[
      "null",
      {"type":"string","avro.java.string":"String"}]},
    {"name":"serverHash","type":"MD5"},
    {"name":"meta","type":[
      "null",
      {"type":"map","values":"bytes","avro.java.string":"String"}]}
  ]}
{code}
looks like this in the builder:
{code}
  Schema result = SchemaBuilder
    .recordType("HandshakeRequest").namespace("org.apache.avro.ipc").fields()
      .name("clientHash").type().fixed("MD5").size(16).noDefault()
      .name("clientProtocol").type().unionOf()
        .nullType().and()
        .stringWith().prop("avro.java.string", "String").endString().endUnion().noDefault()
      .name("serverHash").type("MD5")
      .name("meta").type().unionOf()
        .nullType().and()
        .map().prop("avro.java.string", "String").values().bytesType().endUnion().withDefault(null)
      .record();
{code}

It supports the same feature set that JSON schemas do:
  * nesting of namespaces ("MD5" above automatically picks up the "org.apache.avro.ipc" namespace)
  * reference of named types by name .type("MD5") above for serverHash
And enforces other rules:
  * union defaults are required to be the same as the first type in the union
  * properties, doc(), namespace, and aliases work only in the contexts that they are supported.


Supported features are scoped with many internal nested types, for example, the field assembler
returned by the record builder's fields() method has only two methods -- name(String) and
record(), and the type builder that name(String) returns type builder for a field, which has
prop(String, String) for the field and the available types, such as map().  A call to map()
returns a map builder, which has prop(String, String) again but for the map, and values()
ends the use of the map builder, changing scope to the nested type and returning down to the
fields assembler when that is complete. 


h4. Remaining Work
* All primitive types are not supported yet (trivial)
* Shortcut methods need to be added for common use cases such as an optional field.
* Naming of some things needs review -- it would be easier if enum, int, long, default, etc
were not reserved java key words :)
* Javadoc is nearly absent.
* There is some room for pushing more common work into parent types.
* Tests
* Attempt to replace the Schema.Parser logic with it, at minimum to test for areas of improvement
or missing features.
* No protocol support yet (e.g. error, protocol, request, response).  It probably makes sense
to extend this to cover all Avro things, including fields and protocols.

I want to checkpoint the work so far and gather feedback.
                
> Enhanced Schema Builder API
> ---------------------------
>
>                 Key: AVRO-1325
>                 URL: https://issues.apache.org/jira/browse/AVRO-1325
>             Project: Avro
>          Issue Type: Bug
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>             Fix For: 1.7.5
>
>
> The schema builder from AVRO-1274 has a few key limitations.  I have proposed changes
to make before it is released and the public API is locked in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message