jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig <mdue...@apache.org>
Subject Re: Lifetime of revision identifiers
Date Wed, 11 Apr 2012 12:15:49 GMT

On 3.4.12 12:19, Dominique Pfister wrote:
> Hi,
> On Apr 3, 2012, at 12:50 PM, Jukka Zitting wrote:
>> Hi,
>> On Tue, Apr 3, 2012 at 11:56 AM, Dominique Pfister<dpfister@adobe.com>  wrote:
>>> On Apr 3, 2012, at 11:51 AM, Jukka Zitting wrote:
>>>> You'd drop revision identifiers from the MicroKernel interface? That's
>>>> a pretty big design change...
>>> No, I probably did not make myself clear: I would not keep a revision
>>> (and all its nodes) reachable in terms of garbage collection, simply
>>> because it was accessed by a client some time ago.
>> If that's the case, I'm worried about what could happen to code like this:
>>     String revision = mk.getHeadRevision();
>>     String root = getNodes("/", revision);
>> Suppose someone else makes a commit in between the two calls and the
>> garbage collector gets triggered. The result then would be that the
>> getNodes() call will fail because the given revision identifier is no
>> longer available.
> If we have a delay of 10 minutes for revisions getting garbage collected, this would
imply that 10 minutes passed between the first call and the second call, right? This seems
rather unlikely.

This does actually *not imply* that 10 minutes pass between the calls. 
The first call might happen an arbitrary short time before the garbage 
collector decides to remove that revision. The second call might thus 
try to retrieve a revision which has in the meanwhile removed.


>> And if you consider that an unlikely enough scenario, consider a case
>> where I want to then page through a potentially large list of the
>> child nodes:
>>     int page_size = 10;
>>     long count = getChildNodeCount(root);
>>     for (long offset = 0; offset<  count; offset += page_size) {
>>         String children = mk.getNodes("/", revision, 1, offset,
>> page_size, null);
>>     }
>> That could take a potentially long time, during which the revision
>> might well get garbage-collected. How should a client prepare for such
>> a situation?
> If simply iterating over this large list takes longer than the 10 minutes mentioned above,
you'd have REALLY have a lot of child nodes. And if the client does some work in between (or
waits for some other user interaction to continue paging), I guess it must be able to handle
this situation gracefully.
> I'm just worried about the other extreme: if you have a lot of such clients requesting
large child node lists on different head revisions, the garbage collector will never be able
to actually collect a revision and space will run out soon.
> Dominique
>> BR,
>> Jukka Zitting

View raw message