jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Reutegger <marcel.reuteg...@gmx.net>
Subject Re: SPI caching, was: [jira] Resolved: (JCR-1361) Lock test assumes that changes in one session are immediately visible in different session
Date Thu, 07 Feb 2008 14:49:12 GMT
Julian Reschke wrote:
> I think I understand batch read, and how JCR2SPI would use that. What I 
> don't see how it helps in this case.
> 
> An SPI implementation *could* return ItemInfos for all children when the 
> NodeInfo for a collection is fetched, but how would it know that anybody 
> wants to see the members?

Angela and I discussed this some time ago and we decided that for now we leave 
to up to the implementation. basically for simplicity. See also javadoc 
RepositoryService.getItemInfos().

>>> I have the feeling that we're optimizing for the wrong use case here.
>>>
>>> If we can't make *read* access efficient enough, we're in trouble. 
>>> And I really don't want to require every SPI implementation to 
>>> subscribe to events from the underlying store, in particular if it's 
>>> remote (think HTTP).
>>
>> that's why I don't even want to get into this business. but if an 
>> implementation wants to cache something it is responsible for 
>> maintaining it.
> 
> That's a broad statement.
> 
> JCR includes "refresh" for good reasons. Are you arguing that it's not 
> needed, and a JCR implementation is responsible for that as well?

It is may be needed however only at the upper level of a JCR implementation. 
Session.refresh() only has an effect on the current session and does not change 
the persistent state nor does it affect other sessions. Translating this into 
the SPI design where everything above the SPI is session session local 
(transient changes, namespace mappings, etc) the refresh IMO only belongs into 
this layer and not the SPI implementation where we rather deal with the 
persistent storage of items.

> I think that would be a fundamentally bad idea, because whether cache 
> information needs to be fresh depends on what the client does. There's 
> no way how the JCR or the SPI implementation would know.

I'm open to discuss this issue, but to me this is rather about a more 
intelligent batch read.

> If a client does a collection listing, asking for a limited set of 
> properties of the members (name, timestamps, mime type, length), it 
> really doesn't care much. However, the SPI implementation has no 
> knowledge about the context in which the information in the NodeInfo is 
> needed, and thus has no way to optimize the operation.

I agree, but this shouldn't be solved individually in each SPI implementation 
using a cache. To me it seems the batch read should be more intelligent and pass 
additional information what is actually needed. We might want to introduce 
something like BatchReadConfig into the SPI [1].

>>> JCR clients today can not rely on fresh session information unless 
>>> they do a refresh(), and it's unclear to me why we would require that 
>>> from an SPI implementation.
>>
>> it is a fundamental requirement that the SPI implementation provides 
>> the most up-to-date item that is available. the refresh semantic is 
>> only relevant in the context of jcr2spi but not the SPI itself.
> 
> Where does this requirement come from? Is it stated somewhere?

It's not stated explicitly, but the RepositoryService says:

"The RepositoryService interface defines methods used to retrieve information 
from the persistent layer of the repository as well as the methods that modify 
its persistent state."

And RepositoryService.getItemInfos() says:

"Method used to 'batch-read' from the persistent storage."

Note that both say 'persistent storage', which is why I understand there 
shouldn't be a cache in between that is stale.

> Did you 
> ever try to compare performance between native Jackrabbit, and an SPI 
> based solution for operations like the one mentioned above?

Yes I did, but the numbers very much depend on the setup. If there is a remoting 
in between the SPI based repository is significantly slower because there are 
lots of round-trips. If everything is in one process the difference is much 
smaller. The SPI calls however can be reduced significantly when the batch-read 
is configured properly and JCR-1011 is in use.

>> Again any call using a SessionInfo should return the most up-to-date 
>> item(s) that are requested.
> 
> Requiring this sounds nice in theory, but I'm *very* skeptic that it 
> works in practice.

That's why I wrote 'should' ;)

I think it does no harm if an SPI implementation provides an item that is 
slightly out of date, because the moment an item is delivered it may already be 
modified again by another session. An SPI client must be able to handle that 
situation. The InvalidItemStateException is used in that situation.

>>  > If the JCR client does call "refresh()", we really should pass that
>>  > information to SPI, either by a new method (which could be more
>>  > elaborate than just refresh() as mentioned by Angela), or [...]
>>
>> That's IMO a more relevant use case that we should consider rather 
>> than caching.
> 
> I'm not sure how this is a different use case, but I really don't care 
> for the motivation.
> 
> At the end of the day, what we should do is *measure* the performance of 
> JCR2SPI compared to native implementations. I'll try to submit a few 
> tests soon.

Some test we have already now. Just build jackrabbit and see the difference 
between jackrabbit-core and jackrabbit-jcr2spi. on my machine jackrabbit-core 
runs the api tests in 33 seconds while jackrabbit-jcr2spi runs them in 48 
seconds. That means the additional spi layers add 45% overhead.

regards
  marcel

[1] 
http://svn.apache.org/repos/asf/jackrabbit/tags/1.4/jackrabbit-spi2jcr/src/main/java/org/apache/jackrabbit/spi2jcr/BatchReadConfig.java

Mime
View raw message