Mailing-List: contact dev-help@jackrabbit.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@jackrabbit.apache.org
Received-SPF: pass (nike.apache.org: domain of julian.reschke@gmx.de
 designates 213.165.64.20 as permitted sender)
Message-ID: <47BFF870.7040804@gmx.de>
Date: Sat, 23 Feb 2008 11:41:52 +0100
From: Julian Reschke <julian.reschke@gmx.de>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de;
 rv:1.8.0.4) Gecko/20060516 Thunderbird/1.5.0.4 Mnenhy/0.7.4.666
MIME-Version: 1.0
To: dev@jackrabbit.apache.org
Subject: Re: [jira] Commented: (JCR-1405) SPI: Introduce
 NodeInfo.getChildInfos()
References: <1984270253.1203760159309.JavaMail.jira@brutus>
In-Reply-To: <1984270253.1203760159309.JavaMail.jira@brutus>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

...continuing on the mailing list; I think this exceeds what an issue 
tracker is good for.

angela (JIRA) wrote:
> angela commented on JCR-1405:
> -----------------------------
> 
> julian:
> 
> if you cant determine the childinfos upon creating the nodeinfo you should (as stated by the javadoc) simply return null,

Trying to determine the child infos in practice means asking the 
underlying storage for them. If there are 1000 children, and I get an 
internal error for the last, I would then have to return null. Which 
means that JCR2SPI asks again, using RepositoryService. Not good.

The only case where this new method would actually help is where the set 
of child node names is known in advance, such as for nodes of type 
nt:file. It's nice to be able to optimize those, but not sufficient.

We started the discussion because of the horrific performance of JCR2SPI 
for large collections (where it currently reaches something around 2% of 
what my persistence layer can do). Are we still trying to solve this?

> if you cant build the nodeinfo due to some exceptional situation you should throw upon getNodeInfo or getItemInfos
> respectively.
> 
> the exception with repositoryservice getChildInfo means the same as the one defined with getNodeInfo or getItemInfos:
> - the target node does not exist (any more) in the persistent state
> - the persistent layer cant be accessed or something similar.

Well. If the *construction* of the NodeInfo now requires to decide 
whether to return child infos or not, then this change doesn't help, 
because it doesn't scale for large collections.

I'm not going to retrieve child information unless somebody asks for it 
-- and that is when NodeInfo.getChildInfos is called, not when the 
NodeInfo is constructed.

> therefore i am with marcels explanation how nodeinfo should be created and work.
> 
> in addition, if you decide to do some lazy loading of the childinfos upon NodeInfo.getChildInfos (or upon RepositoryService.getChildInfos) the exception from my point of view is not raised upon building the iterator but upon retrieving the next element.... and there you wont be able to throw repository exception either.

...which may be an indication that a generic Iterator is not the right 
thing to use either.

> regarding "large":
> this is just one obvious example what could be a reason for the implementation NOT to reveal
> the child infos upon NodeInfo.getChildInfos. and the description mentions this as example.

Again; I started this discussion because of the performance for large 
collections. You seem to try to solve an entirely different problem -- 
do we have any data that indicates that it's worth solving?

How exactly is it better than batch read?

> that it states: if the impl is not willing.
> 
> Not willing means that the SPI implementations decides upon internal rules whether the
> childinfos are included or not. examples: the impl. decides
> 
> - based on the internal structure of the persistent layer in general
> - based the cost of retrieving childinfos (given the potential chance of never being asked for)

See -- that's the problem. It seems to me what we really need is a way 
to indicate that the children *will* be needed.

> - based on the known characteristics of the target node: e.g. we have folder and files and other nodes
>   and we assume that folders will be used for displaying the children so send it. for any other nodes we dont

See above -- doesn't work in practice.

> - based on the simple amount of child nodes if we know that (dont calc if more than 14)
> - based on a implementation specific configuration
>   that could include nodetypes, number of child nodes, day time, session.userId, random... whatever
>   you feel would be appropriate, reasonable or simply a good thing for your specific store.
> 
> the last is pretty much what we discussed for the getItemInfos method for the batch read. we said
> that we cant add a config to the spi interfaces and want to leave that to the impl because we would
> not be able to find something that fits the needs for all potential implementations.

I do agree that the SPI impl needs to decide on things like that. But we 
have to give it sufficient information.

> if your store cant retrieve the child info you may
> - create your reposervice with a config and leave the decision to someone else
> - always calculate the child infos

Again, that doesn't work for the use case we're trying to solve. Or at 
least the one I thought we're trying to solve.

> - never calculate the child infos
> - decide based on the characteristics of the requested node 
> -...
> (see above)
> 
> so. i am not in favor of adding exceptions to the new method... at least not for the reasons presented so far.
> angela

I'm in favor to first clearly state what we're trying to do; then create 
tests for obtaining measurements; and then re-discuss what needs to be done.

BR, Julian