jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Mueller <muel...@adobe.com>
Subject Re: ids and locality (was: [jr3] Sequential Node Ids)
Date Tue, 11 Jan 2011 08:50:43 GMT
Hi,

>the assumption that the sequence
>of creating nodes has a great overlap with node
>locality. this may be the case for certain creation
>patterns, but will not work well in general.

This is an assumption as well :-) For adding nodes, sequential node ids
are clearly faster (see the graph). For reading data, we just don't know
how sequential node ids behave for normal use cases, because we couldn't
test it so far. Now we can.

>I think we should leverage the locality information
>that is available in the path of an item and use
>this information also in the persistence layer.

That's a nice idea, however there are some problems. Currently node ids
are fixed in size, so we can't use the path or a formula of the path,
meaning we can't use something like 'hierarchy ids':
http://msdn.microsoft.com/en-us/library/bb677290.aspx - there are related
solutions, one is using the nested set model,
http://en.wikipedia.org/wiki/Nested_set_model - but this also requires
variable size keys. For testing purposes, we may be able to implement a
node id generator that generates hierarchy ids, but it would fail for
certain use cases (where the hierarchy id doesn't fit in 128 bits).

>while sequential ids give you a hint when nodes
>were created, paths provide you a much better
>hint on locality. 

This is an assumption as well, which would need to be tested. Sometimes
related data is stored in different branches. Which method is 'better' or
'faster' is hard to say, if there is no way to test it.

Regards,
Thomas


Mime
View raw message