Return-Path: Delivered-To: apmail-cayenne-user-archive@www.apache.org Received: (qmail 27445 invoked from network); 13 Nov 2009 16:06:31 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 13 Nov 2009 16:06:31 -0000 Received: (qmail 42096 invoked by uid 500); 13 Nov 2009 16:06:30 -0000 Delivered-To: apmail-cayenne-user-archive@cayenne.apache.org Received: (qmail 42082 invoked by uid 500); 13 Nov 2009 16:06:30 -0000 Mailing-List: contact user-help@cayenne.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cayenne.apache.org Delivered-To: mailing list user@cayenne.apache.org Received: (qmail 42072 invoked by uid 99); 13 Nov 2009 16:06:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Nov 2009 16:06:30 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [216.239.58.188] (HELO gv-out-0910.google.com) (216.239.58.188) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Nov 2009 16:06:28 +0000 Received: by gv-out-0910.google.com with SMTP id r4so498793gve.35 for ; Fri, 13 Nov 2009 08:06:05 -0800 (PST) MIME-Version: 1.0 Received: by 10.216.86.144 with SMTP id w16mr1521483wee.59.1258128365392; Fri, 13 Nov 2009 08:06:05 -0800 (PST) In-Reply-To: <4AFD800D.1090806@tsi-solutions.nl> References: <4AFD7943.4010604@tsi-solutions.nl> <5adb61290911130729r6298f127x70cf4d068e83f2e9@mail.gmail.com> <4AFD800D.1090806@tsi-solutions.nl> From: Michael Gentry Date: Fri, 13 Nov 2009 11:05:45 -0500 Message-ID: <5adb61290911130805i7c9d259es219315548484a3ea@mail.gmail.com> Subject: Re: Object Caching To: user@cayenne.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Hans, Even using a paginated query in Cayenne, it would've eventually pulled everything into memory. The paginated query is really designed to be used in a UI where the user is going to see a limited amount of data and probably not page too many times. The iterated query is the best approach for trying to process a large number of records in Cayenne. Good luck! mrg On Fri, Nov 13, 2009 at 10:49 AM, Hans Pikkemaat wrote: > Hi, > > Of course I don't want to load the whole thing into memory. > I want to run the query and use an iterator to go through the results. > Using paging the jdbc driver is able to produce chunks which prevents > the whole resultset to be loaded into memory. > > This same principle I was trying to accomplish using cayenne but clearly > without success. > > So I'm going to fall back to cayenne iterated query or even jdbc. > > tx > > Hans > > > Michael Gentry wrote: >> >> I'm not exactly sure what you are trying to accomplish, but could you >> use plain SQL to do the job (run it from an SQL prompt)? =A0That's the >> approach I normally take when I have to do updates to large amounts of >> data. =A0Especially for a one-off task or something ill-suited to Java >> code. =A0Even if you were using raw JDBC (no ORM) and tried to pull back >> 2.5 million records it would be difficult. =A0I don't know the size of >> the data record you are using, but if it is even 1k (not an >> unreasonable size) it would require 2.5 GB of RAM just to hold the >> records. >> >> mrg >> >> >> On Fri, Nov 13, 2009 at 10:20 AM, Hans Pikkemaat >> wrote: >> >>> >>> Hi, >>> >>> That was the initial approax I tried. The problem with this is that I >>> cannot >>> manually >>> create relations between objects constructed from data rows. This means >>> that >>> when >>> I access the detail table through the relation it will execute a query = to >>> get them from >>> the database. >>> >>> If I have 100 main records it runs 100 queries to get all the details. >>> This is not performing well. I need to run 1 query which is doing a lef= t >>> join and >>> gets all the data in one go. >>> >>> But I totally agree with you that ORM is too much overhead. I don't nee= d >>> caching >>> or something like that. Actually I'm trying to prevent that it is cachi= ng >>> the records. >>> I'm working on a solution now that is using the iterated query which is >>> returning >>> datarows where I construct new objects and the relationsship between th= em >>> myself. >>> >>> tx >>> >>> Hans >>> >>> >>> Michael Gentry wrote: >>> >>>> >>>> Not just Cayenne, Hans. =A0No ORM efficiently handles the scale you ar= e >>>> talking about. =A0You need to find a way to break your query down into >>>> smaller chunks to process. =A0What you are doing might be workable wit= h >>>> 50k records, but not 2.5m. =A0Find a way to break your query down into >>>> smaller units to process or explore what Andrus suggested with >>>> ResultIterator: >>>> >>>> http://cayenne.apache.org/doc/iterating-through-data-rows.html >>>> >>>> If you can loop over one record at a time and process it (thereby >>>> letting the garbage collector clean out the ones you have processed) >>>> then your memory usage should be somewhat stable and manageable, even >>>> if the initial query time takes a while. >>>> >>>> mrg >>>> >>>> >>>> On Fri, Nov 13, 2009 at 7:09 AM, Hans Pikkemaat >>>> wrote: >>>> >>>> >>>>> >>>>> Anyway, my conclusion is indeed: don't use cayenne for large query >>>>> processing. >>>>> >>>>> >>>> >>>> >>> >>> >> >> > > >