avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thiruvalluvan M. G. (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-210) Memory leak with recursive schemas when constructed by hand
Date Wed, 18 Nov 2009 13:24:39 GMT

    [ https://issues.apache.org/jira/browse/AVRO-210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779440#action_12779440

Thiruvalluvan M. G. commented on AVRO-210:

Any scheme that uses only reference counting will cause a leak in case of circular references.
One partial solution for this is to use Boost SharedPtr and WeakPtr. There are two kinds of
references between nodes in a schema - parent to child reference and symbolic references.
We can use SharedPtr to refer to children in parents and use WeakPtr for symbolic references.
This will not have cycles and no leaks.

But this solution has one problem in multi-threaded situations. If a thread holds an intermediate
node n1 in a temporary (say during a schema walk) and another thread deletes the "root" node,
all nodes that are ancestors of n1 will get cleared. But one of these cleared nodes could
be referred through a weak pointer by one of the children of n1. Then that weak pointer will
become invalid. So the thread that is doing a schema walk will not get the whole picture.

I suppose this will not be a big problem and we can live with it.

If there are no big objections to this approach, I'll submit a patch.

> Memory leak with recursive schemas when constructed by hand
> -----------------------------------------------------------
>                 Key: AVRO-210
>                 URL: https://issues.apache.org/jira/browse/AVRO-210
>             Project: Avro
>          Issue Type: Bug
>          Components: c++
>            Reporter: Thiruvalluvan M. G.
> Schema consists of a node or bunch of nodes. These nodes are represented as intrusive
pointers of nodes (NodPtr). Since the intrusive pointers use reference counts, recursive schemas
which result in cycles of intrusive pointers lead to memory leak. The following code, when
compiled and run, causes the memory to grow steadily:
> {code:title=test.cc|borderStyle=solid}
> #include <unistd.h>
> #include "Schema.hh"
> int main(int argc, char** argv)
> {
>     const int count1 = 10;
>     const int count2 = 1000;
>     for (int i = 0; i < count1; i++) {
>         for (int j = 0; j < count2; j++) {
>             avro::RecordSchema rec("LongList");
>             rec.addField("value", avro::LongSchema());
>             avro::UnionSchema next;
>             next.addType(avro::NullSchema());
>             next.addType(rec);
>             rec.addField("next", next);
>             rec.addField("end", avro::BoolSchema());
>         }
>         sleep(1);
>     }
> }
> {code}
> The leak should not happen when we build the schema by parsing a JSON schema file. This
is because the current implementation does not use pointers for symbolic links; it uses symbols
and there is a symbol table that resolves the symbols at runtime. But unfortunately the nested
schema file generates an error. I'll file a separate JIRA for that.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message