avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-654) Recursive #validate() for union'ed schemas in Ruby cripples performance
Date Fri, 03 Sep 2010 17:41:36 GMT

    [ https://issues.apache.org/jira/browse/AVRO-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905980#action_12905980
] 

Doug Cutting commented on AVRO-654:
-----------------------------------

Note that full, recursive validation is not required for union dispatch.

http://avro.apache.org/docs/1.3.3/spec.html#Unions

So a typical implementation of a union writer might look something like:
{code}
writeUnion(datum, union) {
  int index = -1;
  for (int i = 0; index ==-1 && i < union.length; i++) {
    case (union[i].type) {
    INT : 
      if (datum is int) {
	index = i;
	break;
      }
    INT : 
      if (datum is long) 
	index = i;
	break;
      }
    ... other unnamed types ...
    RECORD:
      if (datum is record) && datum.name.equals(union[i].name) {
	index = i;
	break;
     }
    ... other named types ...
  }
  writeInt(index);
  write(datum, union[index]);
}
{code}

      

> Recursive #validate() for union'ed schemas in Ruby cripples performance
> -----------------------------------------------------------------------
>
>                 Key: AVRO-654
>                 URL: https://issues.apache.org/jira/browse/AVRO-654
>             Project: Avro
>          Issue Type: Bug
>          Components: ruby
>    Affects Versions: 1.3.3
>            Reporter: Philip (flip) Kromer
>
> The ruby DatumWriter calls #validate() on each #write(). In the case of a schema with
many nested unions (cf. Cassandra's*), this requires a recursive depth-first search to determine
which branch to take. In ruby, these operations are very expensive -- enough to limit write
speeds to 2k/sec on a machine of moderate size.
> For repeated writing of the same data structure, one idea would be to create a CompiledDatumWriter.
This would walk through the validation and assemble an tree of the methods to apply to each
schema element in turn:
>   [ [:write_long 'id'], [:write_bytes, 'name'], [:write_record, 'address', [:write_long,
'street']] ] 
> ---
> * http://github.com/infochimps/cassandra/blob/beta1_plus_patches/interface/avro/cassandra.avpr

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message