Return-Path: X-Original-To: apmail-avro-dev-archive@www.apache.org Delivered-To: apmail-avro-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8DF47D306 for ; Fri, 14 Sep 2012 22:38:10 +0000 (UTC) Received: (qmail 64627 invoked by uid 500); 14 Sep 2012 22:38:09 -0000 Delivered-To: apmail-avro-dev-archive@avro.apache.org Received: (qmail 64570 invoked by uid 500); 14 Sep 2012 22:38:09 -0000 Mailing-List: contact dev-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@avro.apache.org Delivered-To: mailing list dev@avro.apache.org Received: (qmail 64510 invoked by uid 99); 14 Sep 2012 22:38:09 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Sep 2012 22:38:09 +0000 Date: Sat, 15 Sep 2012 09:38:09 +1100 (NCT) From: "Vivek Nadkarni (JIRA)" To: dev@avro.apache.org Message-ID: <883629281.82861.1347662289360.JavaMail.jiratomcat@arcas> In-Reply-To: <1805664371.82820.1347661628202.JavaMail.jiratomcat@arcas> Subject: [jira] [Updated] (AVRO-1167) Avro-C: avro_schema_copy() does not copy AVRO_LINKs properly. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/AVRO-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vivek Nadkarni updated AVRO-1167: --------------------------------- Attachment: AVRO-1167-TEST.patch The test in AVRO-1167-TEST.patch shows the memory leak caused by avro_schema_copy() when an AVRO_LINK is copied. I have provided a non-recursive schema containing an AVRO_LINK in this test. Once AVRO-766 is fixed, this issue should be revisited for recursive schemas as well. The valgrind output is below: ==15602== 1,410 (24 direct, 1,386 indirect) bytes in 1 blocks are definitely lost in loss record 23 of 23 ==15602== at 0x4C28F9F: malloc (vg_replace_malloc.c:236) ==15602== by 0x4C29019: realloc (vg_replace_malloc.c:525) ==15602== by 0x40D364: avro_default_allocator (allocation.c:36) ==15602== by 0x405428: avro_schema_link (schema.c:663) ==15602== by 0x406946: avro_schema_copy (schema.c:1260) ==15602== by 0x406722: avro_schema_copy (schema.c:1165) ==15602== by 0x403EEA: main (test_avro_1167.c:74) ==15602== ==15602== LEAK SUMMARY: ==15602== definitely lost: 96 bytes in 3 blocks ==15602== indirectly lost: 2,748 bytes in 28 blocks ==15602== possibly lost: 0 bytes in 0 blocks ==15602== still reachable: 0 bytes in 0 blocks ==15602== suppressed: 0 bytes in 0 blocks ==15602== ==15602== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 4 from 4) > Avro-C: avro_schema_copy() does not copy AVRO_LINKs properly. > ------------------------------------------------------------- > > Key: AVRO-1167 > URL: https://issues.apache.org/jira/browse/AVRO-1167 > Project: Avro > Issue Type: Bug > Components: c > Affects Versions: 1.7.1 > Environment: Ubuntu Linux 11.10 > Reporter: Vivek Nadkarni > Attachments: AVRO-1167-TEST.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > When avro_schema_copy() encounters an AVRO_LINK from an old_schema to a new_schema, it sets the target of the new_link to the target of the old_link in the old_schema. Thus, the AVRO_LINK in the new_schema points to an element in the old_schema. > While this is currently safe, since the reference count of the target in the old_schema is incremented, we are not really making a "copy" of the schema. > There is a "TODO" in the code, which says that we should make a avro_schema_copy() of the target in old_schema instead of linking directly to it. However, this solution of making a copy would result in a few problems: > 1. Avro schemas are intended to be self-contained. That implies that AVRO_LINKs are intended to be internal links inside a self-contained schema. The code introduces unnecessary (and potentially disallowed) external dependencies in an Avro schema. > 2. The purpose of copying a schema that we want to decouple the old_schema from the new_schema. The two copies may have different owners, we may want to deallocate old schema etc. > 3. If the schema is recursive, then the code would enter an infinite recursion loop. > It appears to me that the "correct" solution would be to replicate the entire structure of the current schema, including the internal links. This means that if old_link_A points to old_target_B, then new_link_A should point to new_target_B in the new schema. Note that there should only be one copy of new_target_B in the new schema, even if there are multiple links pointing to new_target_B - i.e. we should not make a new copy for each link. > In order to implement this proper copying of links, we would need to keep a lookup table of pairs of old and new schemas as they are being created, as well as a list of all the AVRO_LINKs that are copied. Then as a post-copy step, we would go and fix up all the AVRO_LINKs to point to the appropriate targets. This is the way the schema is constructed in the first place in avro_schema_from_json(). > An inefficient way to obtain the correct result from avro_schema_copy() would be to perform an avro_schema_to_json() followed by an avro_schema_from_json(). > Note: I have not implemented a fix for this issue, but I am documenting this issue in AVRO-JIRA because this issue needs to be resolved before AVRO-766 can be fixed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira