jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chetan Mehrotra (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-6339) MapRecord#getKeys should should initialize child iterables lazily
Date Fri, 01 Sep 2017 10:08:01 GMT

    [ https://issues.apache.org/jira/browse/OAK-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150323#comment-16150323
] 

Chetan Mehrotra commented on OAK-6339:
--------------------------------------

Backported to 1.6 for segment module only with 1806918

> MapRecord#getKeys should should initialize child iterables lazily
> -----------------------------------------------------------------
>
>                 Key: OAK-6339
>                 URL: https://issues.apache.org/jira/browse/OAK-6339
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segment-tar
>            Reporter: Chetan Mehrotra
>            Assignee: Michael Dürig
>            Priority: Minor
>              Labels: candidate_oak_1_6
>             Fix For: 1.8, 1.7.3
>
>         Attachments: OAK-6339-1.6.patch
>
>
> Recently we saw OutOfMemory using [oakRepoStats|https://github.com/chetanmeh/oak-console-scripts/tree/master/src/main/groovy/repostats]
script with a SegmentNodeStore setup where uuid index has 16M+ entries and thus creating a
very flat hierarchy. This happened while computing Tree#getChildren iterator which internally
invokes MapRecord#getKeys to obtain an iterable for child node names.
> This happened because code in getKeys computes the key list eagerly by calling bucket.getKeys()
which recursivly calls same for each child bucket and thus resulting in eager evaluation.
> {code}
>         if (isBranch(size, level)) {
>             List<MapRecord> buckets = getBucketList(segment);
>             List<Iterable<String>> keys =
>                     newArrayListWithCapacity(buckets.size());
>             for (MapRecord bucket : buckets) {
>                 keys.add(bucket.getKeys());
>             }
>             return concat(keys);
>         }
> {code}
> Instead here we should use same approach as used in MapRecord#getEntries i.e. evalate
the iterable for child buckets lazily
> {code}
>         if (isBranch(size, level)) {
>             List<MapRecord> buckets = getBucketList(segment);
>             List<Iterable<MapEntry>> entries =
>                     newArrayListWithCapacity(buckets.size());
>             for (final MapRecord bucket : buckets) {
>                 entries.add(new Iterable<MapEntry>() {
>                     @Override
>                     public Iterator<MapEntry> iterator() {
>                         return bucket.getEntries(diffKey, diffValue).iterator();
>                     }
>                 });
>             }
>             return concat(entries);
>         }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message