jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dominique Pfister <dpfis...@adobe.com>
Subject Re: Lifetime of revision identifiers
Date Tue, 03 Apr 2012 11:19:54 GMT
Hi,

On Apr 3, 2012, at 12:50 PM, Jukka Zitting wrote:

> Hi,
> 
> On Tue, Apr 3, 2012 at 11:56 AM, Dominique Pfister <dpfister@adobe.com> wrote:
>> On Apr 3, 2012, at 11:51 AM, Jukka Zitting wrote:
>>> You'd drop revision identifiers from the MicroKernel interface? That's
>>> a pretty big design change...
>> 
>> No, I probably did not make myself clear: I would not keep a revision
>> (and all its nodes) reachable in terms of garbage collection, simply
>> because it was accessed by a client some time ago.
> 
> If that's the case, I'm worried about what could happen to code like this:
> 
>    String revision = mk.getHeadRevision();
>    String root = getNodes("/", revision);
> 
> Suppose someone else makes a commit in between the two calls and the
> garbage collector gets triggered. The result then would be that the
> getNodes() call will fail because the given revision identifier is no
> longer available.

If we have a delay of 10 minutes for revisions getting garbage collected, this would imply
that 10 minutes passed between the first call and the second call, right? This seems rather
unlikely.

> 
> And if you consider that an unlikely enough scenario, consider a case
> where I want to then page through a potentially large list of the
> child nodes:
> 
>    int page_size = 10;
>    long count = getChildNodeCount(root);
>    for (long offset = 0; offset < count; offset += page_size) {
>        String children = mk.getNodes("/", revision, 1, offset,
> page_size, null);
>    }
> 
> That could take a potentially long time, during which the revision
> might well get garbage-collected. How should a client prepare for such
> a situation?

If simply iterating over this large list takes longer than the 10 minutes mentioned above,
you'd have REALLY have a lot of child nodes. And if the client does some work in between (or
waits for some other user interaction to continue paging), I guess it must be able to handle
this situation gracefully. 

I'm just worried about the other extreme: if you have a lot of such clients requesting large
child node lists on different head revisions, the garbage collector will never be able to
actually collect a revision and space will run out soon.

Dominique

> 
> BR,
> 
> Jukka Zitting


Mime
View raw message