jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig <mdue...@apache.org>
Subject Re: Lifetime of revision identifiers
Date Tue, 03 Apr 2012 11:35:24 GMT

On 3.4.12 12:19, Dominique Pfister wrote:
> Hi,
> On Apr 3, 2012, at 12:50 PM, Jukka Zitting wrote:
>> Hi,
>> On Tue, Apr 3, 2012 at 11:56 AM, Dominique Pfister<dpfister@adobe.com>  wrote:
>>> On Apr 3, 2012, at 11:51 AM, Jukka Zitting wrote:
>>>> You'd drop revision identifiers from the MicroKernel interface? That's
>>>> a pretty big design change...
>>> No, I probably did not make myself clear: I would not keep a revision
>>> (and all its nodes) reachable in terms of garbage collection, simply
>>> because it was accessed by a client some time ago.
>> If that's the case, I'm worried about what could happen to code like this:
>>     String revision = mk.getHeadRevision();
>>     String root = getNodes("/", revision);
>> Suppose someone else makes a commit in between the two calls and the
>> garbage collector gets triggered. The result then would be that the
>> getNodes() call will fail because the given revision identifier is no
>> longer available.
> If we have a delay of 10 minutes for revisions getting garbage collected, this would
imply that 10 minutes passed between the first call and the second call, right? This seems
rather unlikely.

10 minutes (like any value) seems quite arbitrary to me. I wouldn't want 
to fix deployments by fiddling around with this. Rather should clients 
be empowered to specify how long they need a certain revision (e.g. by a 
lease model as Jukka proposed).

>> And if you consider that an unlikely enough scenario, consider a case
>> where I want to then page through a potentially large list of the
>> child nodes:
>>     int page_size = 10;
>>     long count = getChildNodeCount(root);
>>     for (long offset = 0; offset<  count; offset += page_size) {
>>         String children = mk.getNodes("/", revision, 1, offset,
>> page_size, null);
>>     }
>> That could take a potentially long time, during which the revision
>> might well get garbage-collected. How should a client prepare for such
>> a situation?
> If simply iterating over this large list takes longer than the 10 minutes mentioned above,
you'd have REALLY have a lot of child nodes. And if the client does some work in between (or
waits for some other user interaction to continue paging), I guess it must be able to handle
this situation gracefully.
> I'm just worried about the other extreme: if you have a lot of such clients requesting
large child node lists on different head revisions, the garbage collector will never be able
to actually collect a revision and space will run out soon.

Do we have evidence on how fast things will grow? To me this feels very 
much like premature optimisation.

If a deployment runs out of space because the client application holds 
on to too many revisions for too long, this can be fixed by optimising 
the client and adjusting the store size to the actual client's 

If OTHO clients fail because of an overly eager garbage collector, you 
will have to play dices with that 10 minutes interval *and* increase the 
store size.


> Dominique
>> BR,
>> Jukka Zitting

View raw message