avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Francis Galiegue <fgalie...@gmail.com>
Subject Re: Limitations in enum symbols not mentioned in the spec
Date Wed, 27 Feb 2013 16:27:02 GMT
On Wed, Feb 27, 2013 at 5:24 PM, Francis Galiegue <fgaliegue@gmail.com> wrote:
> Hello,
>
> I have tried to parse this schema:
>
> {
>     "name": "gender",
>     "type": "enum",
>     "symbols": [ "MALE", "FEMALE", "WHO CARES?" ]
> }
>
> But the parser complains about an illegal character in the third symbol.
>
> The problem is, nothing in the spec as far as I can see says that the
> set of usable code points in a symbol is limited at all...
>
> So, what is this allowed set of code points?
>
> --
> Francis Galiegue, fgaliegue@gmail.com
> JSON Schema in Java: http://json-schema-validator.herokuapp.com

OK, beginning of answer to self:

    if (!(Character.isLetter(first) || first == '_'))
      throw new SchemaParseException("Illegal initial character: "+name);
    for (int i = 1; i < length; i++) {
      char c = name.charAt(i);
      if (!(Character.isLetterOrDigit(c) || c == '_'))
        throw new SchemaParseException("Illegal character in: "+name);

It therefore means any unicode letter or digit, or the underscore, is
allowed anywhere, except at the first point where there must not be an
underscore. So, it means the following is legal:

[ "mémé", "dans", "les" "orties" ]

Right?

--
Francis Galiegue, fgaliegue@gmail.com
JSON Schema in Java: http://json-schema-validator.herokuapp.com

Mime
View raw message