avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jakob Homan (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AVRO-1795) Python2: Cannot parse nested schemas
Date Thu, 04 Feb 2016 22:15:39 GMT
Jakob Homan created AVRO-1795:
---------------------------------

             Summary: Python2: Cannot parse nested schemas
                 Key: AVRO-1795
                 URL: https://issues.apache.org/jira/browse/AVRO-1795
             Project: Avro
          Issue Type: Bug
          Components: python
    Affects Versions: 1.8.0
            Reporter: Jakob Homan
            Assignee: Jakob Homan


In the Java client, one can parse nested schemas by loading the nested schema before the nesting
schema.  

For example, a header can be defined in one file:
{code:javascript}{ "namespace": "python.avro",
      "type": "record",
      "name": "header",
      "fields": [
         { "name": "header_field", "type": "string" }
       ]
    }{code}
and then included in another schema:
{code:javascript}{ "namespace": "python.avro",
      "type": "record",
      "name": "event",
      "fields": [
         {  "name": "header", "type": "python.avro.header" },
         {  "name": "event_field", "type": "string" }
      ]
    }{code}
As long as one instantiates the Parser and loads the header first, the schemas will be reconciled
and merged correctly.

However, the Python client does not support this.  The {{parse}} method of the {{schema.py}}
file always instantiates a new Names object to hold the schemas:
{code}def parse(json_string):
  """Constructs the Schema from the JSON text."""
  # TODO(hammer): preserve stack trace from JSON parse
  # parse the JSON
  try:
    json_data = json.loads(json_string)
  except:
    raise SchemaParseException('Error parsing JSON: %s' % json_string)

  # Initialize the names object
  names = Names()

  # construct the Avro Schema object
  return make_avsc_object(json_data, names){code}

Some possible fixes for this are:
1) Create a separate Parser class to mimic the Schema.Parser Java approach, while deprecating
the current parse method. 
2) Include Names as a global variable to the parse method, allowing multiple parse calls to
populate the same namespace.  This breaks current behavior (and at least one unit test depends
on it), so would be backwards compatible.
3) Create a new parse method that returns not only the schema, but also the Names instance
and accepts that instance.  This keeps the code nice and functional while exposing the Names
class, which previously had been not particularly public.

I like the first approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message