avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tjwp <...@git.apache.org>
Subject [GitHub] avro pull request #230: Ruby encoding performance improvements
Date Mon, 12 Jun 2017 17:09:43 GMT
GitHub user tjwp opened a pull request:

    https://github.com/apache/avro/pull/230

    Ruby encoding performance improvements

    This change includes several optimizations of the validation performed during encoding
using Ruby. For a use case with a few levels of nesting and unions in several places within
the schema we saw a 5x improvement in encoding performance with these changes.
    
    The main changes are:
    
    1. Avoid the exhaustive validation of schemas in a union. Previously a datum was tested
against all schemas in a union even though the failures were unused if a compatible schema
was found. Now validation stops when the first compatible schema is found, but all failures
are still available if there is no compatible type.
    
    2. Avoid the repeated validation of nested schemas. Previously, the datum was recursively
validated against the schema prior to encoding. Then during encoding, each complex field (record,
array, map, union) was recursively validated again. Thus each field was validated a number
of times equal to its level of nesting plus one. This change introduces an option for validation
not to recurse. Since encoding proceeds recursively, validation is instead performed as each
level is encoded.
    
    0ther minor improvements:
    - delay creating error messages until they are required
    - use explicit instead of dynamic code (`&method(:is_a?)`)
    - additional use of constants
    
    The only additional tests in this change demonstrate that validation without recursion
returns the same results for "simple" fields and no validation errors for complex fields that
would require recursion.
    
    The updated methods for `Avro::Schema.validate` and `Avro::SchemaValidator.validate!`
were implemented to take an options hash with the new `:recursive` option in anticipation
of eventually being combined with logical type support (https://github.com/apache/avro/pull/116)
which would specify whether the datum is already `:encoded`.
    
    These changes have been tested against:
      - 1.9.3-p551
      - 2.0.0-p648
      - 2.1.10
      - 2.2.7
      - 2.3.4
      - 2.4.1

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/salsify/avro ruby-validation-perf

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/avro/pull/230.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #230
    
----
commit 97b350457b74a4b79b591f4e3d9b439a347fc5d7
Author: Tim Perkins <tperkins@salsify.com>
Date:   2017-06-12T16:34:59Z

    Ruby encoding performance improvements

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message