Return-Path: Delivered-To: apmail-openjpa-dev-archive@www.apache.org Received: (qmail 11414 invoked from network); 1 Feb 2008 00:17:34 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 Feb 2008 00:17:34 -0000 Received: (qmail 53798 invoked by uid 500); 1 Feb 2008 00:17:25 -0000 Delivered-To: apmail-openjpa-dev-archive@openjpa.apache.org Received: (qmail 53778 invoked by uid 500); 1 Feb 2008 00:17:25 -0000 Mailing-List: contact dev-help@openjpa.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@openjpa.apache.org Delivered-To: mailing list dev@openjpa.apache.org Received: (qmail 53769 invoked by uid 99); 1 Feb 2008 00:17:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Jan 2008 16:17:25 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ppoddar@bea.com designates 66.248.192.39 as permitted sender) Received: from [66.248.192.39] (HELO repmmg02.bea.com) (66.248.192.39) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Feb 2008 00:16:57 +0000 Received: from repmmr02.bea.com (repmmr02.bea.com [10.160.30.72]) by repmmg02.bea.com (Switch-3.3.0/Switch-3.2.7) with ESMTP id m110H2l3031067 for ; Thu, 31 Jan 2008 16:17:02 -0800 Received: from repbex01.amer.bea.com (repbex01.bea.com [10.160.26.98]) by repmmr02.bea.com (Switch-3.3.0/Switch-3.2.7) with ESMTP id m110Gwgi008989 for ; Thu, 31 Jan 2008 16:17:01 -0800 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Subject: RE: Extension to OpenJPA for distributed databases Date: Thu, 31 Jan 2008 16:16:55 -0800 Message-ID: <3992B07C0590B548BB294D31768A1DA2D3809A@repbex01.amer.bea.com> In-Reply-To: <89c0c52c0801311433r395b3c64of77c4de7234abb65@mail.gmail.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Extension to OpenJPA for distributed databases Thread-Index: AchkWXEZRIuO3wMDQK6t5MUe8PpO5gACQWEA References: <3992B07C0590B548BB294D31768A1DA2D37CED@repbex01.amer.bea.com> <89c0c52c0801311433r395b3c64of77c4de7234abb65@mail.gmail.com> From: "Pinaki Poddar" To: x-BEA-PMX-Instructions: AV x-BEA-MM: Internal-To-External X-Virus-Checked: Checked by ClamAV on apache.org Hi Kevin, Thank you for your interest and valued suggestion. > Is this support meant for databases that do not support partitions directly? Slice is targeted for environments with multiple stand-alone database instances, possibly even heterogeneous. If an application wants to bring data from these database instances into a *single* in-memory persistence context then Slice can be useful. For database vendor that supports horizontal partitioning, one will be better off with standard OpenJPA, and of course, data distribution then becomes a decision around partition key rather than a user-defined policy plug-in. > The DistributionPolicy interface seems a bit limiting. The contract is Slice calls back with list of configured slice and a newly persistence-capable instance X, user tells which slice should store X. > The slice names in the configuration can not change without a corresponding change in the DistributionPolicy callback. Yes and No. I am thinking what to do with this issue and thank you for your input. However, one guiding principle I will like to adhere to "Entity classes must be agonistic of the partitioned database environment". Why I said No: Let us consider a concrete example. I am going to store all Person whose name is less than 'John Doe' in the first slice and rest in another. So my DistributionPolicy implementaion looks like String distribute(Object pc, List slices, Object ctx) { if (((Person)pc).getName().compareTo("John Doe") > 0) return slices.get(0); return slices.get(1); In my configuration how the slices are logically named is immaterial in such a case. I can call them And later edit them to without any change in application behavior. > Maybe the callback could return an opaque Object based on whatever (key?) that could then be used by our runtime to determine the proper slice? With ObjectGrid, we did this via a PartitionableKey interface that the primary key would have to implement. "via a PartitionableKey interface that the primary key would have to implement." -- this is what possibly violates my guiding principle. But may be I need to understand your suggested solution. > When you mention possible "parallel execution", are you assuming the use of the openjpa "multithreaded" property for the EntityManagers? Or, would this parallel execution utilize separate EntityManagers? Neither. A single EntityManager E uses a DistributedStoreManager DM which in turn holds connection to many database DB1,DB2 etc. Now when JPQL query Q is issued by E, DM runs the same SQL query against DB1, DB2 -- but each SQL query is executed on separate thread drawn from a pool. The results of each query is collected, merged with ordering and returned to the caller as a single result list. > On first read, this support looks to be very cool for top-down development. Depending on your response to the first bullet, I find it > harder to understand how a customer might already have a poor-man's version of partitioning and work upwards. Just thinking outloud... We have to wait for people to use it to know whether this makes sense. Andy Schlaikjer is our first user trying it on 100 database instances. May be Andy should comment. Regards and thanks again for your interest -- Pinaki -----Original Message----- From: Kevin Sutter [mailto:kwsutter@gmail.com] Sent: Thursday, January 31, 2008 4:34 PM To: dev@openjpa.apache.org Subject: Re: Extension to OpenJPA for distributed databases Pinaki, I like the idea. I used to be involved with the ObjectGrid project here at IBM and we used a similar technique for partitioning our in-memory cache. I have a few questions about Slice, but for the most part, I am in favor of including it in the OpenJPA deliverable. - Basic question. Is this support meant for databases that do not support partitions directly? My experience has been that if a database supports partitioning directly, then the interaction with the database doesn't change at all. That is, the application (or openjpa runtime in this case) does not have to change to take advantage of the partitioning. It's transparent. But, your documentation seems to indicate required slice configuration and callbacks. I'm just trying to understand how you see this support fitting into the partitioned database landscape. - The DistributionPolicy interface seems a bit limiting. The application code is now very tightly linked with the configuration. The slice names in the configuration can not change without a corresponding change in the DistributionPolicy callback. I would prefer something more general. Maybe the callback could return an opaque Object based on whatever (key?) that could then be used by our runtime to determine the proper slice? With ObjectGrid, we did this via a PartitionableKey interface that the primary key would have to implement. We would then callback on the getPartition() method to get the Object value which we would then use to determine the partition. This could be a String value, if so desired. But, it also allowed other Object types as well. - When you mention possible "parallel execution", are you assuming the use of the openjpa "multithreaded" property for the EntityManagers? Or, would this parallel execution utilize separate EntityManagers? - On first read, this support looks to be very cool for top-down development. Depending on your response to the first bullet, I find it harder to understand how a customer might already have a poor-man's version of partitioning and work upwards. Just thinking outloud... Like I said up-front, I like the basic idea of Slice. I think we probably need a bit more discussion on how this fits into the overall database landscape and architecture, but eventually I would like to see this become part of OpenJPA. Thanks and nice work. Kevin On Jan 30, 2008 5:44 PM, Pinaki Poddar wrote: > Hi, > I would like to add an extension of OpenJPA that allows an > application to transact against a set of distributed, possibly > hetereogenous, horizontally-partitioned databases [2]. The project is > named as Slice and is similar in scope to Hibernate Shards. > The development codebase so far been maintained in Apache Lab > repository and given its current state I propose to add the codebase > to a new openajpa-slice module. > > I request you to review current state of its implementaion [1] and > express your opinion/views on feasibility of my proposal. > > Regards -- > > Pinaki > > [1] Slice website: > http://people.apache.org/~ppoddar/slice/site/index.html apache.org/%7Eppoddar/slice/site/index.html> > [2] dev2dev blog: > http://dev2dev.bea.com/blog/pinaki.poddar/archive/2008/01/slice_openjp > a_ > f_1.html > > Notice: This email message, together with any attachments, may > contain information of BEA Systems, Inc., its subsidiaries and > affiliated entities, that may be confidential, proprietary, > copyrighted and/or legally privileged, and is intended solely for the > use of the individual or entity named in this message. If you are not > the intended recipient, and have received this message in error, > please immediately return this by email and then delete it. > Notice: This email message, together with any attachments, may contain information of BEA Systems, Inc., its subsidiaries and affiliated entities, that may be confidential, proprietary, copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by email and then delete it.