cayenne-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Gentry <blackn...@gmail.com>
Subject Re: Cayenne object storage / memory usage
Date Wed, 05 Jul 2017 17:35:31 GMT
That makes much more sense!  That'll teach me to sleep-read.  Well,
probably not.  :-)

These are pretty nice improvements overall.  When is 4.1 coming out?  :-)

Thanks,

mrg


On Wed, Jul 5, 2017 at 1:21 PM, Andrus Adamchik <andrus@objectstyle.org>
wrote:

> >  I'm wondering if you
> > inadvertently switched old vs new in the performance section?  (Since the
> > new, on the right, is always slower.)
>
> The benchmark is million ops per second. So a bigger value is
> better/faster (kind of like RPM in a car).
>
> Andrus
>
> > On Jul 5, 2017, at 7:31 PM, Michael Gentry <blacknext@gmail.com> wrote:
> >
> > Hi Nikita,
> >
> > I saw the pull request and was taking a glance at it, so thanks for
> > following up with an e-mail.
> >
> > The memory improvement looks quite nice, but I'm wondering if you
> > inadvertently switched old vs new in the performance section?  (Since the
> > new, on the right, is always slower.)
> >
> > Thanks,
> >
> > mrg
> >
> >
> > On Wed, Jul 5, 2017 at 10:19 AM, Nikita Timofeev <
> ntimofeev@objectstyle.com>
> > wrote:
> >
> >> Hi all,
> >>
> >> I've run some additional benchmarks for field-based classes inspired
> >> by John and they were so promising, that I've moved on
> >> to the implementation.
> >>
> >> So here is pull request for you to review [1].
> >> Here [2] you can see what new generated classes will look like.
> >>
> >> For me there is no visible downsides in this solution, e.g. both
> >> memory usage and speed are improved.
> >> All tests are clean and the only minor incompatibility out there
> >> is in HOLLOW state that no longer resets object's values [3]
> >> (though this can be implemented as well, I'm just
> >> not sure this is really needed).
> >>
> >> P.S. here is some raw numbers from my benchmarks.
> >> I'm giving absolute numbers, but really only their relation is
> important.
> >> Results for old version are on the left, for new version on the right.
> >>
> >> Memory usage:
> >> ==============
> >> 1. 10.000 small objects
> >> (int, Date and String ~ 20 chars)
> >>>>> 6Mb vs 2.5Mb <<<
> >>
> >> 2. 10.000 objects with big values
> >> (int, Date and String ~ 1K chars)
> >> Actually in case of same classes (same field number),
> >> there will be just constant difference,
> >> so this is just to get idea what to expect in different cases.
> >>>>> 24.5Mb vs 21Mb <<<
> >>
> >> Performance:
> >> ==============
> >> (numbers are in millions ops per sec, measured with JMH benchmark)
> >> 1. Getter:
> >>>>> 107 vs 177 <<<
> >>
> >> 2. Setter:
> >> Not so impressive, as Cayenne stack took most of the
> >> time here to process graph diff, but still new methods are better.
> >>>>> 12.5 vs 14.5 <<<
> >>
> >> 3. readPropertyDirectly:
> >>>>> 152 vs 248 <<<
> >>
> >> 4. writePropertyDirectly:
> >> This is map.put() vs switch(String) battle,
> >> and map definitely loosing it :)
> >>>>> 126 vs 582 <<<
> >>
> >> [1] https://github.com/apache/cayenne/pull/235
> >> [2] https://github.com/stariy95/cayenne/blob/
> >> 544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/
> >> test/java/org/apache/cayenne/testdo/testmap/auto/_Artist.java
> >> [3] https://github.com/stariy95/cayenne/blob/
> >> 544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/
> >> test/java/org/apache/cayenne/access/DataContextExtrasIT.java#L144
> >>
> >> On Wed, Jun 21, 2017 at 10:20 PM, John Huss <johnthuss@gmail.com>
> wrote:
> >>> I was surprised by the difference in memory too, but this is a small
> diff
> >>> (apart from the newly generated readPropertyDirectly/
> >> writePropertyDirectly
> >>> methods) so there isn't anything else going on.  My unverified
> assumption
> >>> of HashMap is that is doubles in size each time it resizes, so entities
> >>> with more fields could cause more waste. For example a entity with 65
> >>> fields would have 63 empty array slots (ignoring fill factor).  So the
> >>> exact savings may vary.
> >>>
> >>> On Sat, Jun 17, 2017 at 1:01 AM Robert Zeigler <
> >> robert.zeigler@roxanemy.com>
> >>> wrote:
> >>>
> >>>> I’m also a little surprised at the 1/2-ing… what were the values
being
> >>>> stored? I suppose in theory, many values are relatively “small”,
> >>>> memory-wise, so having the overhead of also storing the key could
> >> ~double
> >>>> the memory use, but if you’re storing large values, I wouldn’t expect
> >> the
> >>>> utilization to drop as dramatically. What were your data values (type
> >> and
> >>>> length distribution for strings)?
> >>>>
> >>>> Thanks!
> >>>>
> >>>> Robert
> >>>>
> >>>>> On Jun 10, 2017, at 6:49 AM, Michael Gentry <blacknext@gmail.com>
> >> wrote:
> >>>>>
> >>>>> Hi John,
> >>>>>
> >>>>> I'm a little surprised that map-based storage is over 2x worse in
> >> memory
> >>>>> consumption.  I'm wondering if there is more going on here than
> >> storage
> >>>> of
> >>>>> the property values.  Would it be simple enough to adapt your test
> >> case
> >>>> to
> >>>>> compare a list of POJOs vs a list of maps and see what the memory
> >>>> footprint
> >>>>> and difference is that way?
> >>>>>
> >>>>> I personally was thinking the big improvement for using fields
> >> directly
> >>>> is
> >>>>> the speed improvement.  I didn't think the memory consumption
> >> difference
> >>>>> would be that dramatic.
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> mrg
> >>>>>
> >>>>>
> >>>>> On Fri, Jun 9, 2017 at 10:55 AM, John Huss <johnthuss@gmail.com>
> >> wrote:
> >>>>>
> >>>>>> I did some experimenting recently to see if changes to the way
data
> >> in
> >>>>>> stored in Cayenne objects could reduce the amount of memory
they
> >>>> consume.
> >>>>>>
> >>>>>> I chose to use separate fields for each property instead of
a
> HashMap
> >>>>>> (which is what CayenneDataObject uses).  The results were very
> >>>> affirming.
> >>>>>> For my test of loading 10,000 objects from every table in my
> >> database I
> >>>> got
> >>>>>> it to use about about *half the memory* of the default class
(from
> >> 921
> >>>> MB
> >>>>>> down to 431 MB).
> >>>>>>
> >>>>>> I know there has been some discussion already about addressing
this
> >>>> topic
> >>>>>> for the next major release, so I thought I'd throw in some
> >> observations
> >>>> /
> >>>>>> questions here.
> >>>>>>
> >>>>>> For my implementation I subclassed CayenneDataObject because
in
> >> previous
> >>>>>> experience I found implementing a replacement to be much more
> >> difficult
> >>>> and
> >>>>>> subject to more bugs due to the less frequently used code path
that
> >>>>>> PersistentObject and it's descriptors take you down.  My apps
rely
> on
> >>>>>> things that are sort of specific to CayenneDataObject like
> >> Validating.
> >>>>>>
> >>>>>> So one question is how we should be addressing the need that
people
> >> may
> >>>>>> have to create their own data classes. Right now I believe the
> >>>> recommended
> >>>>>> path is to subclass PersistentObject, but I'm not convinced
that
> that
> >>>> is a
> >>>>>> viable solution without wholesale copying most of CayenneDataObject
> >> into
> >>>>>> your subclass.  I'd rather see a fuller base class (in addition
to
> >>>> keeping
> >>>>>> PersistentObject around) that includes all of CayenneDataObject
> >> except
> >>>> the
> >>>>>> property storage (HashMap).
> >>>>>>
> >>>>>> For my implementation I had to modify CayenneDataObject, but
only
> >>>> slightly
> >>>>>> to avoid creating the HashMap which I wasn't using. However,
because
> >>>> class
> >>>>>> isn't really intended for customization this map is referenced
in
> >>>> multiple
> >>>>>> methods that can't easily be overridden to change the way things
are
> >>>>>> stored.
> >>>>>>
> >>>>>> Another approach might be to ask why anyone should need to customize
> >> the
> >>>>>> way data is stored in the objects if we can just use the best
> >> solution
> >>>>>> possible in the first place?  I can't imagine a more efficient
> >>>>>> representation that fields.  However, fields present difficulties
> for
> >>>> the
> >>>>>> use case where you aren't generating unique classes for your
model
> >> but
> >>>> just
> >>>>>> rely on the generic class.  In theory this could be addressed
via
> >>>> runtime
> >>>>>> code generation or something else, but that would be quite a
change.
> >>>>>>
> >>>>>> So I'm looking forward to discussing this and toward the future.
> >>>>>>
> >>>>>> John
> >>>>>>
> >>>>
> >>>>
> >>
> >>
> >>
> >> --
> >> Best regards,
> >> Nikita Timofeev
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message