openjpa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fernando Padilla <>
Subject Re: slices, collocation
Date Sun, 23 Nov 2008 20:22:24 GMT
hmm.  So maybe I was too quick to say that the collocation constraint is 
too inhibiting.  Coming from my expectations of what a sharding ORM 
system would provide for me, it definitely is too constraining.  But I 
promise to put more thought, maybe in different use cases it's still ok. 
  So I'll continue to think on this.

But I ask for you guys to think on the use cases that can't be 
implemented and usability costs that the collocation constraint places 
on the system.

I know that with sharding you can never execute a join across databases, 
so fancier queries will not execute as expected.  But baking that 
limitation of sharding into the data model system itself seems like over 
doing it.  Just warning people that they have to be careful not to 
traverse relations that are not collocated would be fine.. we're not 
children after all :) :)

But like I said, we're taking a big bet that OpenJPA slices will fit our 
scale out requirements.  So thank you!  This is an amazing head start, 
and looks solidly built and coded.  So I'll keep thinking on this, the 
limitations and possibilities :)  And my complaints are pretty minor in 
the big picture.

For example, I have a work-around to the collocation constraint, I'm 
just seeing if we can make the system nicer and easier to use.  My 
work-around would be to store references to objects (ids), not the 
objects themselves (cross db joins are impossible).  Then in our 
application we'll load the referenced objects are desired.. So that we 
maintain the relations, not the ORM system...

Fernando Padilla wrote:
> right, thank you :)
> you have re-confirmed how I thought the collocation constraint worked, 
> and you also gave me a great motivation why the "replicated" feature 
> came about ( as a work around for the collocation constraint ).
> So now we're back to sqaure one.  Looking at my example use case, the 
> collocation constraint is still too inhibiting.  I want to get rid of 
> those requirements! :)
> So if you wanted to remove that requirement, how would you go about it? 
>  What code would you look at, etc etc.  If I want to put work into 
> fixing this up, where should I begin to look, etc etc.  what are some 
> possible plans.. :) :) :)
> Pinaki Poddar wrote:
>>   One key aspect of data distribution model used in Slice is that the
>> distribution policy is based at instance level and *not* at class level.
>> What it implies for your given scenario is that while User U1 instance 
>> can
>> be persisted in Slice A, another User instance U2 can be stored in 
>> Slice B.
>> So it is not necessary that all User instances are stored in one Slice 
>> and
>> all Comment instances are in a different slice and so forth.
>>   But what about related instances? For the sake of concreteness let us
>> consider the following instances and relations:
>>   User U1 belongs to Group G1 and has commented C11, C12, C13
>>   User U2 belongs to Group G1 and has commented C21
>> The distribution policy determines that U1 and U2 are stored in Slice 
>> A and
>> B respectively.
>> The collocation constraint forces that any instance reachable from U1 
>> (i.e.
>> closure of U1 in Graph theory terms) is stored in Slice A and any 
>> instance
>> reachable from U2 is stored in U2. Thus, C11, C12, C13 go to Slice A 
>> while
>> C21 goes to Slice B.
>> Where does G1 go? G1 is reachable from both U1 and U2. The only current
>> option is G1 is annotated as @Replicated and identical copies of G1 are
>> stored in both Slice A and B.
>> Of course, collocation constraint will prohibit G1 to have a relation 
>> to U1
>> and U2. So, @Replicated is mainly serves to model 'master' data i.e. data
>> that are referred by many but itself refers none. However, the 
>> relationship
>> is not completely lost. For example, a query such as    select u from 
>> User u where'G1'" will fetch both U1 and U2 by executing 
>> parallel queries across Slice A and B
>> and merging the results.
>> Fernando Padilla wrote:
>>> So, now that I have some attention, I'll post up a question I sent 
>>> out a month ago.
>>> I want to make a connected datamodel, but I want to put objects on 
>>> different databases..
>>> Let's say I have 3 objects:
>>> User (slice root)
>>>   - name
>>> Group (slice root)
>>>   - name
>>>   - users
>>> Comment (slice grouped with group)
>>>   - group
>>>   - user
>>>   - text
>>> As you can see they are all inter-related.  But I let's say I want to 
>>> distribute Users and Groups across databases.  But they are related, 
>>> but can't be collocated.
>>> So can you help me understand the "collocation" limitation of slices, 
>>> and a way to enhance it to remove this limitation ( if I understand 
>>> it properly ).
>>> ps - If i understand the limitation, I can't have a ManyToMany 
>>> relationship from Group to Users, or ManyToOne from Comment to User, 
>>> instead I would have to have a set of userIds.  And I would have to 
>>> load up each user object myself through code.

View raw message