Return-Path: Delivered-To: apmail-incubator-open-jpa-dev-archive@locus.apache.org Received: (qmail 80128 invoked from network); 5 Oct 2006 22:53:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 5 Oct 2006 22:53:10 -0000 Received: (qmail 64862 invoked by uid 500); 5 Oct 2006 22:53:02 -0000 Delivered-To: apmail-incubator-open-jpa-dev-archive@incubator.apache.org Received: (qmail 64792 invoked by uid 500); 5 Oct 2006 22:53:01 -0000 Mailing-List: contact open-jpa-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: open-jpa-dev@incubator.apache.org Delivered-To: mailing list open-jpa-dev@incubator.apache.org Received: (qmail 64753 invoked by uid 99); 5 Oct 2006 22:53:01 -0000 Received: from idunn.apache.osuosl.org (HELO idunn.apache.osuosl.org) (140.211.166.84) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Oct 2006 15:53:01 -0700 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests= Received: from [66.160.117.11] ([66.160.117.11:15702] helo=ptint5.peacetech.com) by idunn.apache.osuosl.org (ecelerity 2.1.1.8 r(12930)) with ESMTP id 75/1D-04543-9CC85254 for ; Thu, 05 Oct 2006 15:52:58 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Subject: Proposal: Optimizing empty collection fetch. Meta Column in ContainerFieldMappling Date: Thu, 5 Oct 2006 18:52:53 -0400 Message-ID: <49BF64BA795F6A498EC2F04F39BAA8D3D97E62@ptint5.peacetech.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Proposal: Optimizing empty collection fetch. Meta Column in ContainerFieldMappling Thread-Index: AcboyvhtjN8YcBhZRYa8wmsUfA3twwAA0S8A From: "Roytman, Alex" To: Cc: "Li, Hao" , "Chen, Andrew" X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Hello Abe, I would like to present a valid use case and a very useful performance enhancement. The idea is that, if we know that a collection field is empty there is no need to fetch it. It can provide a truly dramatic performance improvement when in a large set of instance only some of them have non-empty collection field. Consider a very common case - composite (tree like) data structures. Unlike true composite pattern typical tree structure does not have a special leaf class that is any node of a tree can potentially have sub-nodes. When traversing such a tree as many as 70% of fetches of child nodes will yield empty collection because obviously leaf level is the larges in a tree structure :-) =20 I wrote a prototype custom 1-N mapping which allow to store "empty" flag (whether the collection is empty) on commit and will store empty collection into StateManager on collection field load if the flag is set to true (empty) instead of going to database to fetch it. The results were dramatic - when traversing 800-node tree number of "fetch-sub-nodes" SQL statements was cut from 800 to 130. Non-Tree cases when objects have sparsely populated collection field can be even more dramatic. If concurrency of the collection field is controlled on owned class level (default) I think there is no dander of this flag being out of synch with actual collection content without entering concurrent modification state. I have not had chance to think through transaction commit implications if any. There is a very nice facility in ContainerFieldMappling for indicating null container fields. I wonder why it so much hard wired to empty/null and does not allow non-empty/empty/null differentiation and optimization. Any reason it is so restrictive? Any plans to make it a bit more flexible or directly implementing the behavior I outlined above? I would greatly appreciate if you could comment on this and may be suggest the best approach implementing this. Or may be it is already implemented and I am missing it :-) Best Regards Alex Roytman Peace Technology, Inc