avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-659) Portable specification of the location of schema and protocol files
Date Wed, 08 Sep 2010 16:26:35 GMT

    [ https://issues.apache.org/jira/browse/AVRO-659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907288#action_12907288

Doug Cutting commented on AVRO-659:

Jeff, I'm still trying to understand the use case you have in mind.

Most folks writing data to files should use an Avro data file, which includes the schema.
 If folks are doing RPC, then the protocol they use to write data is typically a file in their
source code tree, and the protocol they use to read data is determined through the handshake.
  If folks are writing individual records to a database then a best practice is to maintain
a registry of schemas used in the database as a separate table, and have each instance refer
to its schema in the registry via its MD5 hash.  The application would still probably store
or create the schemas it uses for new database records with the source code.  The registry
is updated when writing records and accessed when reading them.

We do not want to encourage folks to write data without also storing the schema used to write
that schema in the same repository as the data. I don't feel a path-based schema registry
is a good idea.  Keeping a copy of the schema with source code that writes data is a good
practice: the schema is part of the writing code and should be versioned with it.  Generating
schemas on the fly when writing data is a fine practice too.  But whenever data is persisted,
its schema should be stored with it.

> Portable specification of the location of schema and protocol files
> -------------------------------------------------------------------
>                 Key: AVRO-659
>                 URL: https://issues.apache.org/jira/browse/AVRO-659
>             Project: Avro
>          Issue Type: New Feature
>            Reporter: Jeff Hammerbacher
> Avro doesn't require code generation, which is great. However, if you want to use a protocol
or a schema, your code needs to know where to find it. When your code is ported to new systems,
the protocol or schema file must be placed in the same place as on the previous system for
things to work correctly.
> For importing modules in a portable fashion, Python provides a default set of places
it will look for modules and an environment variable called PYTHONPATH that programs can use
to override these defaults. It may be useful to explore similar constructs for Avro implementations
that don't do code generation. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message