cayenne-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Gentry <mgen...@masslight.net>
Subject Re: Fetching and processing a large amount of objects
Date Mon, 17 Dec 2012 16:39:20 GMT
Hi Simon,

I think I misunderstood something you said earlier because I thought you
already had a "processed" flag you could query against.  Given that you
don't and I'm not sure why your performIteratedQuery() is failing, perhaps
you could merge using data rows with paginated queries:

http://cayenne.apache.org/docs/3.0/data-rows.html
http://cayenne.apache.org/docs/3.0/paginated-queries.html

I suspect, however, this will not scale as much as you need (I think the
paginated query will fetch in ~500k data rows still).  You may end up
having to do an SQLTemplate query and fetch only the primary keys (which is
what a paginated query does), and then do a loop fetching batches of your
records based upon the primary keys (using new DataContexts, of course).
 This is a bit more work, but shouldn't have issues.

mrg



On Mon, Dec 17, 2012 at 10:40 AM, Simon Schneider <sschneider@mackoy.de>wrote:

> Hi Michael,
>
> I understand your approach of using a flag to identify already processed
> objects. But introducing a flag or in my case another state just for
> processing my records, was something I wanted to avoid. I thought that
> Cayenne maybe has another way of fetching objects in a memory preserving
> manner. Maybe some Iterator which on creation fetches the primary keys
> only. And then while iterating, batches of data rows are fetched in the
> background.
>
> Simon
>
>
> Am 17.12.2012 um 15:50 schrieb Michael Gentry:
>
> > Hi Simon,
> >
> > I don't know why your performIteratedQuery() would fail with a heap
> error.
> > Based upon your answer to #2, it sounds like you can do a fetch limit on
> > your query (call dataContext.setFetchLimit(limit) and do a normal
> > performQuery() and you'll get back real Cayenne objects) and only pull
> back
> > 100 or 1000 records, process them (setting them to a different state),
> then
> > commit.  Do this in a new DataContext each time so the GC can reclaim the
> > memory.
> >
> > mrg
> >
> >
> >
> > On Mon, Dec 17, 2012 at 8:38 AM, Simon Schneider <sschneider@mackoy.de
> >wrote:
> >
> >> Hi Michael,
> >>
> >> the problem is, that I do not even get an iterator because executing a
> >> query like the following results in a Java Heap Space error:
> >>
> >> ResultIterator it = dataContext.performIteratedQuery(query);
> >>
> >> The answers to your questions are:
> >>
> >>> 1) How many records are you talking about?
> >> It's about half a million records
> >>
> >>> 2) Are you updating your object with a flag/etc you can query on again
> >> later (to exclude objects you've already processed)?
> >> I already do exclude objects by setting them to a different state. But
> it
> >> may happen that I have to process half a million records despite of
> this.
> >>
> >>> 3) What version of Cayenne are you using and what database?
> >> Cayenne 3.0.2, Postgres 9.1
> >>
> >>> 4) When you convert your Map (from the iterated query) into a
> >> DataObject, are you creating a new DataContext or using the old one over
> >> and over again?
> >> At the moment I am using just one DataContext unregistering the
> processed
> >> objects. But as mentioned above execution does not even get to this
> point.
> >>
> >> Simon
> >>
> >>> Hi Simon, some questions:
> >>>
> >>> 1) How many records are you talking about?
> >>> 2) Are you updating your object with a flag/etc you can query on again
> >> later (to exclude objects you've already processed)?
> >>> 3) What version of Cayenne are you using and what database?
> >>> 4) When you convert your Map (from the iterated query) into a
> >> DataObject, are you creating a new DataContext or using the old one over
> >> and over again?
> >>>
> >>> For #4, if you are using the same DataContext repeatedly, try changing
> >> your logic to something more like:
> >>>
> >>> while (iterator.hasNextRow()) {
> >>>    DataContext context = DataContext.createDataContext();
> >>>    Map row = (Map) iterator.nextRow();
> >>>    CayenneObject object = (CayenneObject)
> >> context.objectFromDataRow("CayenneObject", row);
> >>>    ...
> >>>    object.doStuff();
> >>>    ...
> >>>    context.commitChanges();
> >>> }
> >>>
> >>> This way you won't build up a ton of objects in a single DataContext
> and
> >> possibly run out of memory.
> >>>
> >>> mrg
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message