cayenne-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Huss <johnth...@gmail.com>
Subject Re: Cayenne object storage / memory usage
Date Thu, 06 Jul 2017 17:35:54 GMT
I'm very glad to see this moving forward! Very exciting! Thanks for your
work on this.
On Thu, Jul 6, 2017 at 8:32 AM Robert Zeigler <robert.zeigler@roxanemy.com>
wrote:

> Kudos on the improvements, and to the original developers (Andrus, et al)
> for a fantastic design. These days, I’ve been doing a lot more Python
> coding than Java and I use SQLAlchemy pretty extensively. It’s nice… but I
> still miss Cayenne’s simplicity/ease of use (SQLAlchemy uses a transaction
> model more akin to Hibernate, though not as egregious).
>
> Best,
>
> Robert
>
> > On Jul 6, 2017, at 7:27 AM, Andrus Adamchik <andrus@objectstyle.org>
> wrote:
> >
> > The fact that we can switch to field-based DataObjects with minimal
> effort and without sacrificing a single thing in the Cayenne design is a
> *very* big deal! Thanks John for bringing the possibility to everyone's
> attention, and Nikita - for the working code and benchmarks.
> >
> > I am going to try this out on a real app some time next week. Very
> exciting! :)
> >
> > Andrus
> >
> >
> >> On Jul 5, 2017, at 5:19 PM, Nikita Timofeev <ntimofeev@objectstyle.com>
> wrote:
> >>
> >> Hi all,
> >>
> >> I've run some additional benchmarks for field-based classes inspired
> >> by John and they were so promising, that I've moved on
> >> to the implementation.
> >>
> >> So here is pull request for you to review [1].
> >> Here [2] you can see what new generated classes will look like.
> >>
> >> For me there is no visible downsides in this solution, e.g. both
> >> memory usage and speed are improved.
> >> All tests are clean and the only minor incompatibility out there
> >> is in HOLLOW state that no longer resets object's values [3]
> >> (though this can be implemented as well, I'm just
> >> not sure this is really needed).
> >>
> >> P.S. here is some raw numbers from my benchmarks.
> >> I'm giving absolute numbers, but really only their relation is
> important.
> >> Results for old version are on the left, for new version on the right.
> >>
> >> Memory usage:
> >> ==============
> >> 1. 10.000 small objects
> >> (int, Date and String ~ 20 chars)
> >>>>> 6Mb vs 2.5Mb <<<
> >>
> >> 2. 10.000 objects with big values
> >> (int, Date and String ~ 1K chars)
> >> Actually in case of same classes (same field number),
> >> there will be just constant difference,
> >> so this is just to get idea what to expect in different cases.
> >>>>> 24.5Mb vs 21Mb <<<
> >>
> >> Performance:
> >> ==============
> >> (numbers are in millions ops per sec, measured with JMH benchmark)
> >> 1. Getter:
> >>>>> 107 vs 177 <<<
> >>
> >> 2. Setter:
> >> Not so impressive, as Cayenne stack took most of the
> >> time here to process graph diff, but still new methods are better.
> >>>>> 12.5 vs 14.5 <<<
> >>
> >> 3. readPropertyDirectly:
> >>>>> 152 vs 248 <<<
> >>
> >> 4. writePropertyDirectly:
> >> This is map.put() vs switch(String) battle,
> >> and map definitely loosing it :)
> >>>>> 126 vs 582 <<<
> >>
> >> [1] https://github.com/apache/cayenne/pull/235
> >> [2]
> https://github.com/stariy95/cayenne/blob/544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/test/java/org/apache/cayenne/testdo/testmap/auto/_Artist.java
> >> [3]
> https://github.com/stariy95/cayenne/blob/544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/test/java/org/apache/cayenne/access/DataContextExtrasIT.java#L144
> >>
> >> On Wed, Jun 21, 2017 at 10:20 PM, John Huss <johnthuss@gmail.com>
> wrote:
> >>> I was surprised by the difference in memory too, but this is a small
> diff
> >>> (apart from the newly generated
> readPropertyDirectly/writePropertyDirectly
> >>> methods) so there isn't anything else going on.  My unverified
> assumption
> >>> of HashMap is that is doubles in size each time it resizes, so entities
> >>> with more fields could cause more waste. For example a entity with 65
> >>> fields would have 63 empty array slots (ignoring fill factor).  So the
> >>> exact savings may vary.
> >>>
> >>> On Sat, Jun 17, 2017 at 1:01 AM Robert Zeigler <
> robert.zeigler@roxanemy.com>
> >>> wrote:
> >>>
> >>>> I’m also a little surprised at the 1/2-ing… what were the values
being
> >>>> stored? I suppose in theory, many values are relatively “small”,
> >>>> memory-wise, so having the overhead of also storing the key could
> ~double
> >>>> the memory use, but if you’re storing large values, I wouldn’t expect
> the
> >>>> utilization to drop as dramatically. What were your data values (type
> and
> >>>> length distribution for strings)?
> >>>>
> >>>> Thanks!
> >>>>
> >>>> Robert
> >>>>
> >>>>> On Jun 10, 2017, at 6:49 AM, Michael Gentry <blacknext@gmail.com>
> wrote:
> >>>>>
> >>>>> Hi John,
> >>>>>
> >>>>> I'm a little surprised that map-based storage is over 2x worse in
> memory
> >>>>> consumption.  I'm wondering if there is more going on here than
> storage
> >>>> of
> >>>>> the property values.  Would it be simple enough to adapt your test
> case
> >>>> to
> >>>>> compare a list of POJOs vs a list of maps and see what the memory
> >>>> footprint
> >>>>> and difference is that way?
> >>>>>
> >>>>> I personally was thinking the big improvement for using fields
> directly
> >>>> is
> >>>>> the speed improvement.  I didn't think the memory consumption
> difference
> >>>>> would be that dramatic.
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> mrg
> >>>>>
> >>>>>
> >>>>> On Fri, Jun 9, 2017 at 10:55 AM, John Huss <johnthuss@gmail.com>
> wrote:
> >>>>>
> >>>>>> I did some experimenting recently to see if changes to the way
data
> in
> >>>>>> stored in Cayenne objects could reduce the amount of memory
they
> >>>> consume.
> >>>>>>
> >>>>>> I chose to use separate fields for each property instead of
a
> HashMap
> >>>>>> (which is what CayenneDataObject uses).  The results were very
> >>>> affirming.
> >>>>>> For my test of loading 10,000 objects from every table in my
> database I
> >>>> got
> >>>>>> it to use about about *half the memory* of the default class
(from
> 921
> >>>> MB
> >>>>>> down to 431 MB).
> >>>>>>
> >>>>>> I know there has been some discussion already about addressing
this
> >>>> topic
> >>>>>> for the next major release, so I thought I'd throw in some
> observations
> >>>> /
> >>>>>> questions here.
> >>>>>>
> >>>>>> For my implementation I subclassed CayenneDataObject because
in
> previous
> >>>>>> experience I found implementing a replacement to be much more
> difficult
> >>>> and
> >>>>>> subject to more bugs due to the less frequently used code path
that
> >>>>>> PersistentObject and it's descriptors take you down.  My apps
rely
> on
> >>>>>> things that are sort of specific to CayenneDataObject like
> Validating.
> >>>>>>
> >>>>>> So one question is how we should be addressing the need that
people
> may
> >>>>>> have to create their own data classes. Right now I believe the
> >>>> recommended
> >>>>>> path is to subclass PersistentObject, but I'm not convinced
that
> that
> >>>> is a
> >>>>>> viable solution without wholesale copying most of CayenneDataObject
> into
> >>>>>> your subclass.  I'd rather see a fuller base class (in addition
to
> >>>> keeping
> >>>>>> PersistentObject around) that includes all of CayenneDataObject
> except
> >>>> the
> >>>>>> property storage (HashMap).
> >>>>>>
> >>>>>> For my implementation I had to modify CayenneDataObject, but
only
> >>>> slightly
> >>>>>> to avoid creating the HashMap which I wasn't using. However,
because
> >>>> class
> >>>>>> isn't really intended for customization this map is referenced
in
> >>>> multiple
> >>>>>> methods that can't easily be overridden to change the way things
are
> >>>>>> stored.
> >>>>>>
> >>>>>> Another approach might be to ask why anyone should need to
> customize the
> >>>>>> way data is stored in the objects if we can just use the best
> solution
> >>>>>> possible in the first place?  I can't imagine a more efficient
> >>>>>> representation that fields.  However, fields present difficulties
> for
> >>>> the
> >>>>>> use case where you aren't generating unique classes for your
model
> but
> >>>> just
> >>>>>> rely on the generic class.  In theory this could be addressed
via
> >>>> runtime
> >>>>>> code generation or something else, but that would be quite a
change.
> >>>>>>
> >>>>>> So I'm looking forward to discussing this and toward the future.
> >>>>>>
> >>>>>> John
> >>>>>>
> >>>>
> >>>>
> >>
> >>
> >>
> >> --
> >> Best regards,
> >> Nikita Timofeev
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message