Mailing-List: contact dev-help@openjpa.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@openjpa.apache.org
Received-SPF: pass (athena.apache.org: domain of ppoddar@bea.com designates
 66.248.192.39 as permitted sender)
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Subject: RE: Extension to OpenJPA for distributed databases
Date: Thu, 31 Jan 2008 16:16:55 -0800
Message-ID: <3992B07C0590B548BB294D31768A1DA2D3809A@repbex01.amer.bea.com>
In-Reply-To: <89c0c52c0801311433r395b3c64of77c4de7234abb65@mail.gmail.com>
Thread-Topic: Extension to OpenJPA for distributed databases
Thread-Index: AchkWXEZRIuO3wMDQK6t5MUe8PpO5gACQWEA
References: <3992B07C0590B548BB294D31768A1DA2D37CED@repbex01.amer.bea.com>
 <89c0c52c0801311433r395b3c64of77c4de7234abb65@mail.gmail.com>
From: "Pinaki Poddar" <ppoddar@bea.com>
To: <dev@openjpa.apache.org>

Hi Kevin,
    Thank you for your interest and valued suggestion.

   > Is this support meant for databases that do not support partitions
directly?
Slice is targeted for environments with multiple stand-alone database
instances, possibly even heterogeneous. If an application wants to bring
data from these database instances into a *single* in-memory persistence
context then Slice can be useful. 
For database vendor that supports horizontal partitioning, one will be
better off with standard OpenJPA, and of course, data distribution then
becomes a decision around partition key rather than a user-defined
policy plug-in.

 > The DistributionPolicy interface seems a bit limiting.
The contract is Slice calls back with list of configured slice and a
newly persistence-capable instance X, user tells which slice should
store X. 

> The slice names in the configuration can not change without a
corresponding change in the DistributionPolicy callback. 
  Yes and No. I am thinking what to do with this issue and thank you for
your input. However, one guiding principle I will like to adhere to 
 "Entity classes must be agonistic of the partitioned database
environment". 

  Why I said No: Let us consider a concrete example. I am going to store
all Person whose name is less than 'John Doe' in the first slice and
rest in another. So my DistributionPolicy implementaion looks like

  String distribute(Object pc, List<String> slices, Object ctx) {
   if (((Person)pc).getName().compareTo("John Doe") > 0) 
      return slices.get(0);
   return slices.get(1);
  
 In my configuration how the slices are logically named is immaterial in
such a case. I can call them
   <property name="slice.One.ConnectionURL" value="jdbc://URL1"/>   
   <property name="slice.Two.ConnectionURL" value="jdbc://URL2"/>

 And later edit them to
   <property name="slice.ABC.ConnectionURL" value="jdbc://URL1"/>   
   <property name="slice.XYZ.ConnectionURL" value="jdbc://URL2"/>  

without any change in application behavior.


  > Maybe the callback could return an opaque Object based on whatever
(key?) that could then be used by our runtime to determine the proper
slice?  With ObjectGrid, we did this via a PartitionableKey interface
that the primary key would have to implement.

   "via a PartitionableKey interface that the primary key would have to
implement." -- this is what possibly violates my guiding principle.
But may be I need to understand your suggested solution.  

 > When you mention possible "parallel execution", are you assuming the
use of the openjpa "multithreaded" property for the EntityManagers?  Or,
   would this parallel execution utilize separate EntityManagers?

 Neither. A single EntityManager E uses a DistributedStoreManager DM
which in turn holds connection to many database DB1,DB2 etc. Now when
JPQL query Q is issued by E, DM runs the same SQL query against DB1, DB2
-- but each SQL query is executed on separate thread drawn from a pool.
The results of each query is collected, merged with ordering and
returned to the caller as a single result list.  
   
  > On first read, this support looks to be very cool for top-down
development.  Depending on your response to the first bullet, I find it
  > harder to understand how a customer might already have a poor-man's
version of partitioning and work upwards.  Just thinking outloud...

  We have to wait for people to use it to know whether this makes sense.
Andy Schlaikjer is our first user trying it on 100 database instances.
May be Andy should comment.


  Regards and thanks again for your interest --


Pinaki 


-----Original Message-----
From: Kevin Sutter [mailto:kwsutter@gmail.com] 
Sent: Thursday, January 31, 2008 4:34 PM
To: dev@openjpa.apache.org
Subject: Re: Extension to OpenJPA for distributed databases

Pinaki,
I like the idea.  I used to be involved with the ObjectGrid project here
at IBM and we used a similar technique for partitioning our in-memory
cache.  I have a few questions about Slice, but for the most part, I am
in favor of including it in the OpenJPA deliverable.

   - Basic question.  Is this support meant for databases that do not
   support partitions directly?  My experience has been that if a
database
   supports partitioning directly, then the interaction with the
database
   doesn't change at all.  That is, the application (or openjpa runtime
in this
   case) does not have to change to take advantage of the partitioning.
It's
   transparent.  But, your documentation seems to indicate required
slice
   configuration and callbacks.  I'm just trying to understand how you
see this
   support fitting into the partitioned database landscape.
   - The DistributionPolicy interface seems a bit limiting.  The
   application code is now very tightly linked with the configuration.
The
   slice names in the configuration can not change without a
corresponding
   change in the DistributionPolicy callback.  I would prefer something
more
   general.  Maybe the callback could return an opaque Object based on
whatever
   (key?) that could then be used by our runtime to determine the proper
   slice?  With ObjectGrid, we did this via a PartitionableKey interface
that
   the primary key would have to implement.  We would then callback on
the
   getPartition() method to get the Object value which we would then use
to
   determine the partition.  This could be a String value, if so
desired.  But,
   it also allowed other Object types as well.
   - When you mention possible "parallel execution", are you assuming
the
   use of the openjpa "multithreaded" property for the EntityManagers?
Or,
   would this parallel execution utilize separate EntityManagers?
   - On first read, this support looks to be very cool for top-down
   development.  Depending on your response to the first bullet, I find
it
   harder to understand how a customer might already have a poor-man's
version
   of partitioning and work upwards.  Just thinking outloud...

Like I said up-front, I like the basic idea of Slice.  I think we
probably need a bit more discussion on how this fits into the overall
database landscape and architecture, but eventually I would like to see
this become part of OpenJPA.  Thanks and nice work.

Kevin


On Jan 30, 2008 5:44 PM, Pinaki Poddar <ppoddar@bea.com> wrote:

> Hi,
>  I would like to add an extension of OpenJPA that allows an 
> application to transact against a set of distributed, possibly 
> hetereogenous, horizontally-partitioned databases [2]. The project is 
> named as Slice and is similar in scope to Hibernate Shards.
>  The development codebase so far been maintained in Apache Lab 
> repository and given its current state I propose to add the codebase 
> to a new openajpa-slice module.
>
>  I request you to review current state of its implementaion [1] and 
> express your opinion/views on feasibility of my proposal.
>
>  Regards --
>
> Pinaki
>
> [1] Slice website:
> http://people.apache.org/~ppoddar/slice/site/index.html<http://people.
> apache.org/%7Eppoddar/slice/site/index.html>
> [2] dev2dev blog:
> http://dev2dev.bea.com/blog/pinaki.poddar/archive/2008/01/slice_openjp
> a_
> f_1.html
>
> Notice:  This email message, together with any attachments, may 
> contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  
> affiliated entities,  that may be confidential,  proprietary,  
> copyrighted  and/or legally privileged, and is intended solely for the

> use of the individual or entity named in this message. If you are not 
> the intended recipient, and have received this message in error, 
> please immediately return this by email and then delete it.
>

Notice:  This email message, together with any attachments, may contain information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,  copyrighted  and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.