avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1022) Error in validate name
Date Fri, 10 Feb 2012 18:39:00 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205616#comment-13205616
] 

Doug Cutting commented on AVRO-1022:
------------------------------------

An implementation would be naive to trust that other implementations have validated all names
in schemas it receives.  Java currently disables validation when reading a schema from a data
file, since it's more important to be able to read the data.  With Generic APIs name validation
isn't required and many applications use only generic APIs.

This would not require support for unicode identifiers in programming languages.  A code generator
should escape any character in a name that's not easy for it to represent in an identifier.
 We'd just be permitting code generators to take advantage of when a programming language
does support Unicode in identifiers.

> If we went the other way (chance the spec), we'd have to answer a bunch of design questions

> (decide what is a "letter," decide on normalization, figure out how to mangle names in
various
> languages, etc.), and then implement validation in each language [ ... ]

I disagree.  Even if we removed all restrictions on naming I don't think we'd add much burden
to implementations.  Most implementations don't do code generation.  Code generators already
need to mangle names.  A code generator should already escape rather than die when it sees
an unexpected character in a name.  (The alternative is an inability to generate code for
schemas that someone else controls, a poor choice.)

So I don't see a new interoperability problem this would create.  We already have schemas
in the wild whose names are invalid.

Perhaps we should change the spec to recommend that names be restricted to ASCII for ease
of programming with generated APIs in all languages.  And we might check that in compiler,
forcing folks to specify --escape-non-ASCII-names if they really want to generate code for
a schema whose names contain non-ASCII characters, to discourage the use of non-ASCII in schemas
that you do control.  In general we could encourage implementations to both not trust that
identifiers are all-ASCII and to try to encourage all-ASCII identifiers.
                
> Error in validate name
> ----------------------
>
>                 Key: AVRO-1022
>                 URL: https://issues.apache.org/jira/browse/AVRO-1022
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Raymie Stata
>            Priority: Minor
>         Attachments: AVRO-1022.patch
>
>
> Fix schema.validateName to allow only ASCII letters, not Unicode letters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message