jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig <mdue...@apache.org>
Subject Re: MongoMK: deleted nodes
Date Thu, 10 Jan 2013 11:01:30 GMT

On 10.1.13 10:53, Thomas Mueller wrote:
> Hi,
> A possible use case for a bloom filter :-)
> Could you tell us more about where in the code it would help?

KernelNodeStore in oak-core might be a good place. This is where the 
positive cache for KernelNodeState resides. So I think it would sense to 
handle the negative case here as well. Since the revision is part of the 
key here, there is no problem with updating the cache also.


> One thing bloom filters are not so good is when items need to be removed
> later on. So if we keep a bloom filter for removed node (speeding up the
> case where a node wasn't deleted for sure), then the problem might be how
> to change the bloom filter if a node is re-added later on. A possible
> solution for this is using a "counting bloom filter" but this one uses
> more memory. If the filter is for one revision only then this isn't a
> problem.
> Regards,
> Thomas
> On 1/10/13 11:26 AM, "Michael Dürig" <mduerig@apache.org> wrote:
>> Hi,
>> This is only loosely related but, checking for nodes which don't exist
>> is expensive across the stack. It currently always results in a full
>> round trip. I think we should at some point come up with some kind of
>> negative cache. Using a bloom filter for this would only add very little
>> memory overhead but might save us quite some round trips.
>> Michael
>> On 10.1.13 10:07, Marcel Reutegger wrote:
>>> Hi,
>>> I was just thinking the way how MongoMK handles deleted nodes and
>>> I'm wondering if we should change it to make reading nodes less
>>> expensive.
>>> I understand reading a node requires checking all ancestors of that node
>>> to find out if it really exists. This is already done that way in
>>> NodeExistsCommand
>>> but it looks like it's missing in FetchNodesActionNew. To me it looks
>>> like
>>> we will have to change FetchNodesActionNew accordingly. But that means
>>> reading will become quite expensive.
>>> Would it be better to mark a node as deleted in the revision it was
>>> deleted?
>>> Reading a node in a given revision would return the node and we'd have
>>> to
>>> add logic to check the new flag. But we'd then immediately know whether
>>> the node exists in that revision without consulting ancestor nodes.
>>> regards
>>>    marcel

View raw message