Mailing-List: contact cayenne-user-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: cayenne-user@incubator.apache.org
Received-SPF: unknown (idunn.apache.osuosl.org: domain tiscali.it does not
 designate 62.241.5.253 as permitted sender)
Message-ID: <451927B4.1030208@tiscali.it>
Date: Tue, 26 Sep 2006 15:14:28 +0200
From: Francesco Fuzio <francescofuzio@tiscali.it>
User-Agent: Thunderbird 1.5.0.7 (Windows/20060909)
MIME-Version: 1.0
To: cayenne-user@incubator.apache.org
Subject: Re: Caching query results
References: <4517CE61.5010007@tiscali.it>
 <B6C595D9-A26D-44C9-A37B-4C9C6EAAB0D7@ish.com.au>
 <395395C9-704B-4CD4-AB31-A3992958C559@objectstyle.org>
 <4517EE35.2020801@tiscali.it>
 <58AFBB6D-9860-47B8-B04E-9A43C2501FBF@objectstyle.org>
In-Reply-To: <58AFBB6D-9860-47B8-B04E-9A43C2501FBF@objectstyle.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Hi Andrus,

first of all thank you for the prompt support and your suggestions.

As you correctly guessed I was talking about Cayenne 1.2.1

I was thinking about this automatic (but based on custom configuration) 
"invalidation algorithm":

Configure somewhere a logic association DataObject--->"QueryData"[] (ex 
Paintings DataObject has to be associated to the queries [Select * from 
Paintings where year = 1300 | Select * from Artists, Paintings where 
Paintings.year >1200 | Select * from Artists, Paintings where 
Paintings.year >1200 order by Paintings.name]

The "QueryData" object shoud contain, separately, information about the 
"expression" (i.e. the "where" part ) and about ordering.
"Select * from Artists, Paintings where Paintings.year >1200 order by 
Paintings.name" ---> { Paintings.year >1200 | order by Paintings.name }

If I modify (or create or delete) a DataObject I have to check the ante 
and post modification version of the single DataObject against the 
associated QueryData's
We could do this exploiting the Objects filtering capabilities 
Expression (or optionally using third parties utilities 
<commons-bean-utils??>) :

Expression filter = Expression.fromString("Paintings.year >1200");
filter.filterObjects(objects);

As a result we would have two Sets of Queries : those matching before 
the modification and those matching after the modification.
For sure we have to invalidate all the query results that are not in the 
intersection of the two sets.

For the queries in the intersection:

a)If they have NO ordering (Order by clause , paging limitation etc) 
they are still valid

b)If they have ordering: if ordering is on one of the modified 
DataObject field, we have to invalidate the query result, otherwise the 
query result is still valid.

Of course this solution can lead to high computational resources use, 
dependent on the number of queries it has to check.
But, for example in the project I am collaborating to, the Db is the 
"under pressure"/bottleneck system and the Middleware has much less load.
In such a situation "moving" load from the Db to the Mw is a benefit for 
the Application as a whole.

For "basic" queries  (I made some tests) I think the algorithm should 
work. Of course more systematic test cases should be performed to 
completely validate the algorithm and/or find its limitation.
Anyway I wanted to share it with you hoping it can be useful or can be 
of some "inspiration" for a proper/more correct solution.


Francesco.


Andrus Adamchik wrote:
> Hi Francesco,
>
>
> On Sep 25, 2006, at 10:56 AM, Francesco Fuzio wrote:
>> Thank you for the answers: I'm definitely looking forward to trying 
>> the 3.0 cool features you mentioned.
>>
>> As for 2.1 (since for us is important to keep data updated without 
>> relying on expiration timing) I was thinking about this approach (for 
>> a clustered environment)
>
> That would be version 1.2.*, right?
>
>> 1) Enable Cayenne Replicated Shared Object Cache
>> 2) Disable Cayenne Query (i.e list ) Cache
>> 3) Use a Caching framework supporting automatic distributed 
>> refresh/invalidation policy (e.g Oscahe or Ehcache) to save query 
>> results as list of ObjectId's.
>> 4) In case of Query "Cache Hit" use the cached ObjectId's to retrieve 
>> the associated DataObjects via the DataContext [ public Persistent 
>> <http://incubator.apache.org/cayenne/1_2/api/cayenne/org/objectstyle/cayenne/Persistent.html> 
>> *localObject*(ObjectId 
>> <http://incubator.apache.org/cayenne/1_2/api/cayenne/org/objectstyle/cayenne/ObjectId.html> 
>> id, Persistent 
>> <http://incubator.apache.org/cayenne/1_2/api/cayenne/org/objectstyle/cayenne/Persistent.html> 
>> prototype)]
>>
>> What do you think, is this approach reasonable? Will it work?
>
> This should work (you'll just use your own cache as a front end to the 
> DataContext query API), and should provide a clean path to the future 
> 3.0 migration. You'll need to consider a few things though:
>
> A. Query cache key generation. In 1.2 this is based on Query name 
> which is pretty dumb and barely usable; in 3.0 SelectQuery and 
> SQLTemplate are smart enough to build the cache key based on their 
> state. You may copy some of that code.
>
>
> B. Invalidation Strategies. That's a tricky one....
>
> I couldn't come up with a well-performing generic solution (I tried, 
> see CAY-577). Consider that events that may cause automatic 
> invalidation are object deletion, insertion and updating (update can 
> affect the ordering and also whether an object still matches the query 
> condition). So *every* commit can potentially invalidate any number of 
> cached lists for a given entity.
>
> The trick is to create an efficient algorithm to invalidate just the 
> right cache entries and avoid invalidating the entire entity cache. 
> Manually scanning and rearranging all lists on every commit is of 
> course very inefficient.
>
> So in 3.0 we added "cache group" notion so that users could categorize 
> queries based on some criteria and then invalidate the whole category 
> of cache entries. (Cache group notion is supported by OSCache by the 
> way). Here is an example.... Consider a "BlogPost" entity. All queries 
> that fetch a date range of BlogPosts can be arbitrarily divided into 
> "old_posts" and "new_posts" categories. So once a user 
> updates/deletes/removes a BlogPost, a code can check the date of this 
> post and invalidate either "old_posts" or "new_posts".
>
> This is just one solution that we came up with. Not automatic, but 
> fairly simple and efficient. You can come up with your own strategies. 
> If you can think of a better generic algorithm for invalidation, 
> please share.
>
> Andrus
>
>
>
> __________ NOD32 1.1767 (20060921) Information __________
>
> This message was checked by NOD32 antivirus system.
> http://www.nod32.com
>
>
>