avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1316) IDL code-generation generates too-long literals for very large schemas
Date Mon, 06 May 2013 18:56:16 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649984#comment-13649984
] 

Doug Cutting commented on AVRO-1316:
------------------------------------

We might optimize the parse methods a bit so that:
 - calls with a single string don't copy that string;
 - calls with multiple strings are not quadratic.

This might look something like:
{code}
public Schema parse(String json, String... moreJson) {
  if (moreJson.length > 0) {
    StringBuilder b = new StringBuilder(json);
    for (String s : moreJson)
      b.append(s);
    json = b.toString();
  }
  ...
}
{code}
and similarly for Protocol.

Also, with varargs we can get rid of the _part variables, since the string is not a compile-time
constant, so the template could contain just something like:

{code}
public static final org.apache.avro.Schema SCHEMA$ 
  = new org.apache.avro.Schema.Parser().parse(${this.javaSplit($schema.toString())});
{code}

Where javaSplit is defined to split, add escapes to each chunk, then insert commas & quotes.
 That would minimize template logic, making it simpler for folks who have alternate templates.

The javaSplit logic will be simpler if we split before escaping.  We just need to be sure
to split into small enough chunks that escapes, UTF-8, etc. won't cause them to pass the 64k
limit.  We might choose something as low as 1k to be safe.  We can then loop calling substring()
to break out chunks.  As we create each chunk we can append them to a StringBuilder that's
initialized with the opening quote, then insert quote-comma-quote between each chunk and add
a final quote at the end.
                
> IDL code-generation generates too-long literals for very large schemas
> ----------------------------------------------------------------------
>
>                 Key: AVRO-1316
>                 URL: https://issues.apache.org/jira/browse/AVRO-1316
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.5
>            Reporter: Jeremy Kahn
>            Priority: Minor
>              Labels: patch
>         Attachments: AVRO-1316.patch, AVRO-1316.patch, AVRO-1316.patch
>
>
> When I work from a very large IDL schema, the Java code generated includes a schema JSON
literal that exceeds the length of the maximum allowed literal string ([65535 characters|http://stackoverflow.com/questions/8323082/size-of-initialisation-string-in-java]).
 
> This creates weird Maven errors like: {{[ERROR] ...FooProtocol.java:[13,89] constant
string too long}}.
> It might seem a little crazy, but a 64-kilobyte JSON protocol isn't outrageous at all
for some of the more involved data structures, especially if we're including documentation
strings etc.
> I believe the fix should be a bit more sensitivity to the length of the JSON literal
(and a willingness to split it into more than one literal, joined by {{+}}), but I haven't
figured out where that change needs to go. Has anyone else encountered this problem?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message