avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vivek Nadkarni (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AVRO-1167) Avro-C: avro_schema_copy() does not copy AVRO_LINKs properly.
Date Fri, 14 Sep 2012 22:38:09 GMT

     [ https://issues.apache.org/jira/browse/AVRO-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vivek Nadkarni updated AVRO-1167:
---------------------------------

    Attachment: AVRO-1167-TEST.patch

The test in AVRO-1167-TEST.patch shows the memory leak caused by avro_schema_copy() when an
AVRO_LINK is copied. I have provided a non-recursive schema containing an AVRO_LINK in this
test. Once AVRO-766 is fixed, this issue should be revisited for recursive schemas as well.

The valgrind output is below:

==15602== 1,410 (24 direct, 1,386 indirect) bytes in 1 blocks are definitely lost in loss
record 23 of 23
==15602==    at 0x4C28F9F: malloc (vg_replace_malloc.c:236)
==15602==    by 0x4C29019: realloc (vg_replace_malloc.c:525)
==15602==    by 0x40D364: avro_default_allocator (allocation.c:36)
==15602==    by 0x405428: avro_schema_link (schema.c:663)
==15602==    by 0x406946: avro_schema_copy (schema.c:1260)
==15602==    by 0x406722: avro_schema_copy (schema.c:1165)
==15602==    by 0x403EEA: main (test_avro_1167.c:74)
==15602== 
==15602== LEAK SUMMARY:
==15602==    definitely lost: 96 bytes in 3 blocks
==15602==    indirectly lost: 2,748 bytes in 28 blocks
==15602==      possibly lost: 0 bytes in 0 blocks
==15602==    still reachable: 0 bytes in 0 blocks
==15602==         suppressed: 0 bytes in 0 blocks
==15602== 
==15602== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 4 from 4)


                
> Avro-C: avro_schema_copy() does not copy AVRO_LINKs properly.
> -------------------------------------------------------------
>
>                 Key: AVRO-1167
>                 URL: https://issues.apache.org/jira/browse/AVRO-1167
>             Project: Avro
>          Issue Type: Bug
>          Components: c
>    Affects Versions: 1.7.1
>         Environment: Ubuntu Linux 11.10
>            Reporter: Vivek Nadkarni
>         Attachments: AVRO-1167-TEST.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> When avro_schema_copy() encounters an AVRO_LINK from an old_schema to a new_schema, it
sets the target of the new_link to the target of the old_link in the old_schema. Thus, the
AVRO_LINK in the new_schema points to an element in the old_schema. 
> While this is currently safe, since the reference count of the target in the old_schema
is incremented, we are not really making a "copy" of the schema.
> There is a "TODO" in the code, which says that we should make a avro_schema_copy() of
the target in old_schema instead of linking directly to it. However, this solution of making
a copy would result in a few problems:
> 1. Avro schemas are intended to be self-contained. That implies that AVRO_LINKs are intended
to be internal links inside a self-contained schema. The code introduces unnecessary (and
potentially disallowed) external dependencies in an Avro schema. 
> 2. The purpose of copying a schema that we want to decouple the old_schema from the new_schema.
The two copies may have different owners, we may want to deallocate old schema etc.
> 3. If the schema is recursive, then the code would enter an infinite recursion loop.
> It appears to me that the "correct" solution would be to replicate the entire structure
of the current schema, including the internal links. This means that if old_link_A points
to old_target_B, then new_link_A should point to new_target_B in the new schema. Note that
there should only be one copy of new_target_B in the new schema, even if there are multiple
links pointing to new_target_B - i.e. we should not make a new copy for each link.
> In order to implement this proper copying of links, we would need to keep a lookup table
of pairs of old and new schemas as they are being created, as well as a list of all the AVRO_LINKs
that are copied. Then as a post-copy step, we would go and fix up all the AVRO_LINKs to point
to the appropriate targets. This is the way the schema is constructed in the first place in
avro_schema_from_json().
> An inefficient way to obtain the correct result from avro_schema_copy() would be to perform
an avro_schema_to_json() followed by an avro_schema_from_json().
> Note: I have not implemented a fix for this issue, but I am documenting this issue in
AVRO-JIRA because this issue needs to be resolved before AVRO-766 can be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message