ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dsetrak...@apache.org
Subject Re: [IGNITE-5717] improvements of MemoryPolicy default size
Date Fri, 04 Aug 2017 09:44:00 GMT
Hang on. I thought we were talking about offheap size, GC should not be relevant. Am I wrong?

⁣D.​

On Aug 4, 2017, 11:38 AM, at 11:38 AM, Sergey Chugunov <sergey.chugunov@gmail.com> wrote:
>Do you see an obvious way of implementing it?
>
>In java there is a heap and GC working on it. And for instance, it is
>possible to make a decision to throw an OOM based on some gc metrics.
>
>I may be wrong but I don't see a mechanism in Ignite to use it right
>away
>for such purposes.
>And implementing something without thorough planning brings huge risk
>of
>false positives with nodes stopping when they don't have to.
>
>That's why I think it must be implemented and intensively tested as
>part of
>a separate ticket.
>
>Thanks,
>Sergey.
>
>On Fri, Aug 4, 2017 at 12:18 PM, <dsetrakyan@apache.org> wrote:
>
>> Without #3, the #1 and #2 make little sense.
>>
>> Why is #3 so difficult?
>>
>> ⁣D.​
>>
>> On Aug 4, 2017, 10:46 AM, at 10:46 AM, Sergey Chugunov <
>> sergey.chugunov@gmail.com> wrote:
>> >Dmitriy,
>> >
>> >Last item makes perfect sense to me, one may think of it as an
>> >"OutOfMemoryException" in java.
>> >However, it looks like such feature requires considerable efforts to
>> >properly design and implement it, so I would propose to create a
>> >separate
>> >ticket and agree upon target version for it.
>> >
>> >Items #1 and #2 will be implemented under IGNITE-5717. Makes sense?
>> >
>> >Thanks,
>> >Sergey.
>> >
>> >On Thu, Aug 3, 2017 at 4:34 AM, Dmitriy Setrakyan
>> ><dsetrakyan@apache.org>
>> >wrote:
>> >
>> >> Here is what we should do:
>> >>
>> >>    1. Pick an acceptable number. Does not matter if it is 10% or
>50%.
>> >>    2. Print the allocated memory in *BOLD* letters into the log.
>> >>    3. Make sure that Ignite server never hangs due to the low
>memory
>> >issue.
>> >>    We should sense it and kick the node out automatically, again
>with
>> >a
>> >> *BOLD*
>> >>    message in the log.
>> >>
>> >>  Is this possible?
>> >>
>> >> D.
>> >>
>> >> On Wed, Aug 2, 2017 at 6:09 PM, Vladimir Ozerov
>> ><vozerov@gridgain.com>
>> >> wrote:
>> >>
>> >> > My proposal is 10% instead of 80%.
>> >> >
>> >> > ср, 2 авг. 2017 г. в 18:54, Denis Magda <dmagda@apache.org>:
>> >> >
>> >> > > Vladimir, Dmitriy P.,
>> >> > >
>> >> > > Please see inline
>> >> > >
>> >> > > > On Aug 2, 2017, at 7:20 AM, Vladimir Ozerov
>> ><vozerov@gridgain.com>
>> >> > > wrote:
>> >> > > >
>> >> > > > Denis,
>> >> > > >
>> >> > > > The reason is that product should not hang user's computer.
>How
>> >else
>> >> > this
>> >> > > > could be explained? I am developer. I start Ignite, 1 node,
>2
>> >nodes,
>> >> X
>> >> > > > nodes, observe how they join topology. Add one key, 10 keys,
>1M
>> >keys.
>> >> > > Then
>> >> > > > I do a bug in example and load 100M keys accidentally -
>restart
>> >the
>> >> > > > computer. Correct behavior is to have small "maxMemory" by
>> >default to
>> >> > > avoid
>> >> > > > that. User should get exception instead of hang. E.g. Java's
>> >"-Xmx"
>> >> is
>> >> > > > typically 25% of RAM - more adequate value, comparing to
>> >Ignite.
>> >> > > >
>> >> > >
>> >> > > Right, the developer was educated about the Java heap
>parameters
>> >and
>> >> > > limited the overall space preferring OOM to the laptop
>> >suspension. Who
>> >> > > knows how he got to the point that 25% RAM should be used.
>That
>> >might
>> >> > have
>> >> > > been deep knowledge about JVM or he faced several hangs while
>> >testing
>> >> the
>> >> > > application.
>> >> > >
>> >> > > Anyway, JVM creators didn’t decide to predefine the Java heap
>to
>> >a
>> >> static
>> >> > > value to avoid the situations like above. So should not we as
>a
>> >> platform.
>> >> > > Educate people about the Ignite memory behavior like Sun did
>for
>> >the
>> >> Java
>> >> > > heap but do not try to solve the lack of knowledge with the
>> >default
>> >> > static
>> >> > > memory size.
>> >> > >
>> >> > >
>> >> > > > It doesn't matter whether you use persistence or not.
>> >Persistent case
>> >> > > just
>> >> > > > makes this flaw more obvious - you have virtually unlimited
>> >disk, and
>> >> > yet
>> >> > > > you end up with swapping and hang when using Ignite with
>> >default
>> >> > > > configuration. As already explained, the problem is not
>about
>> >> > allocating
>> >> > > > "maxMemory" right away, but about the value of "maxMemory"
-
>it
>> >is
>> >> too
>> >> > > big.
>> >> > > >
>> >> > >
>> >> > > How do you know what should be the default then? Why 1 GB? For
>> >> instance,
>> >> > > if I end up having only 1 GB of free memory left and try to
>start
>> >2
>> >> > server
>> >> > > nodes and an application I will face the laptop suspension
>again.
>> >> > >
>> >> > > —
>> >> > > Denis
>> >> > >
>> >> > > > "We had this behavior before" is never an argument. Previous
>> >offheap
>> >> > > > implementation had a lot of flaws, so let's just forget
>about
>> >it.
>> >> > > >
>> >> > > > On Wed, Aug 2, 2017 at 5:08 PM, Denis Magda
><dmagda@apache.org>
>> >> wrote:
>> >> > > >
>> >> > > >> Sergey,
>> >> > > >>
>> >> > > >> That’s expectable because as we revealed from this
>discussion
>> >the
>> >> > > >> allocation works different depending on whether the
>> >persistence is
>> >> > used
>> >> > > or
>> >> > > >> not:
>> >> > > >>
>> >> > > >> 1) In-memory mode (the persistence is disabled) - the
space
>> >will be
>> >> > > >> allocated incrementally until the max threshold is reached.
>> >Good!
>> >> > > >>
>> >> > > >> 2) The persistence mode - the whole space (limited by
the
>max
>> >> > threshold)
>> >> > > >> is allocated right away. It’s not surprising that your
>laptop
>> >starts
>> >> > > >> choking.
>> >> > > >>
>> >> > > >> So, in my previous response I tried to explain that I
can’t
>> >find any
>> >> > > >> reason why we should adjust 1). Any reasons except for
the
>> >massive
>> >> > > >> preloading?
>> >> > > >>
>> >> > > >> As for 2), that was a big surprise to reveal this after
2.1
>> >release.
>> >> > > >> Definitely we have to fix this somehow.
>> >> > > >>
>> >> > > >> —
>> >> > > >> Denis
>> >> > > >>
>> >> > > >>> On Aug 2, 2017, at 6:59 AM, Sergey Chugunov <
>> >> > sergey.chugunov@gmail.com
>> >> > > >
>> >> > > >> wrote:
>> >> > > >>>
>> >> > > >>> Denis,
>> >> > > >>>
>> >> > > >>> Just a simple example from our own codebase: I tried
to
>> >execute
>> >> > > >>> PersistentStoreExample with default settings and
two
>server
>> >nodes
>> >> and
>> >> > > >>> client node got frozen even on initial load of data
into
>the
>> >grid.
>> >> > > >>> Although with one server node the example finishes
pretty
>> >quickly.
>> >> > > >>>
>> >> > > >>> And my laptop isn't the weakest one and has 16 gigs
of
>> >memory, but
>> >> it
>> >> > > >>> cannot deal with it.
>> >> > > >>>
>> >> > > >>>
>> >> > > >>> On Wed, Aug 2, 2017 at 4:58 PM, Denis Magda
>> ><dmagda@apache.org>
>> >> > wrote:
>> >> > > >>>
>> >> > > >>>>> As far as allocating 80% of available RAM
- I was
>against
>> >this
>> >> even
>> >> > > for
>> >> > > >>>>> In-memory mode and still think that this
is a wrong
>> >default.
>> >> > Looking
>> >> > > at
>> >> > > >>>>> free RAM is even worse because it gives you
undefined
>> >behavior.
>> >> > > >>>>
>> >> > > >>>> Guys, I can not understand how this dynamic memory
>> >allocation's
>> >> > > >> high-level
>> >> > > >>>> behavior (with the persistence DISABLED) is different
>from
>> >the
>> >> > legacy
>> >> > > >>>> off-heap memory we had in 1.x. Both off-heap
memories
>> >allocate the
>> >> > > >> space on
>> >> > > >>>> demand, the current just does this more aggressively
>> >requesting
>> >> big
>> >> > > >> chunks.
>> >> > > >>>>
>> >> > > >>>> Next, the legacy one was unlimited by default
and the
>user
>> >can
>> >> start
>> >> > > as
>> >> > > >>>> many nodes as he wanted on a laptop and preload
as much
>data
>> >as he
>> >> > > >> needed.
>> >> > > >>>> Sure he could bring down the laptop if too many
entries
>were
>> >> > injected
>> >> > > >> into
>> >> > > >>>> the local cluster. But that’s about too massive
>preloading
>> >and not
>> >> > > >> caused
>> >> > > >>>> by the ability of the legacy off-heap memory
to grow
>> >infinitely.
>> >> The
>> >> > > >> same
>> >> > > >>>> preloading would cause a hang if the Java heap
memory
>mode
>> >is
>> >> used.
>> >> > > >>>>
>> >> > > >>>> The upshot is that the massive preloading of
data on the
>> >local
>> >> > laptop
>> >> > > >>>> should not fixed with repealing of the dynamic
memory
>> >allocation.
>> >> > > >>>> Is there any other reason why we have to use
the static
>> >memory
>> >> > > >> allocation
>> >> > > >>>> for the case when the persistence is disabled?
I think
>the
>> >case
>> >> with
>> >> > > the
>> >> > > >>>> persistence should be reviewed separately.
>> >> > > >>>>
>> >> > > >>>> —
>> >> > > >>>> Denis
>> >> > > >>>>
>> >> > > >>>>> On Aug 2, 2017, at 12:45 AM, Alexey Goncharuk
<
>> >> > > >>>> alexey.goncharuk@gmail.com> wrote:
>> >> > > >>>>>
>> >> > > >>>>> Dmitriy,
>> >> > > >>>>>
>> >> > > >>>>> The reason behind this is the need to to
be able to
>evict
>> >and
>> >> load
>> >> > > >> pages
>> >> > > >>>> to
>> >> > > >>>>> disk, thus we need to preserve a PageId->Pointer
mapping
>in
>> >> memory.
>> >> > > In
>> >> > > >>>>> order to do this in the most efficient way,
we need to
>know
>> >in
>> >> > > advance
>> >> > > >>>> all
>> >> > > >>>>> the address ranges we work with. We can add
dynamic
>memory
>> >> > extension
>> >> > > >> for
>> >> > > >>>>> persistence-enabled config, but this will
add yet
>another
>> >step of
>> >> > > >>>>> indirection when resolving every page address,
which
>adds a
>> >> > > noticeable
>> >> > > >>>>> performance penalty.
>> >> > > >>>>>
>> >> > > >>>>>
>> >> > > >>>>>
>> >> > > >>>>> 2017-08-02 10:37 GMT+03:00 Dmitriy Setrakyan
<
>> >> > dsetrakyan@apache.org
>> >> > > >:
>> >> > > >>>>>
>> >> > > >>>>>> On Wed, Aug 2, 2017 at 9:33 AM, Vladimir
Ozerov <
>> >> > > vozerov@gridgain.com
>> >> > > >>>
>> >> > > >>>>>> wrote:
>> >> > > >>>>>>
>> >> > > >>>>>>> Dima,
>> >> > > >>>>>>>
>> >> > > >>>>>>> Probably folks who worked closely
with storage know
>why.
>> >> > > >>>>>>>
>> >> > > >>>>>>
>> >> > > >>>>>> Without knowing why, how can we make
a decision?
>> >> > > >>>>>>
>> >> > > >>>>>> Alexey Goncharuk, was it you who made
the decision
>about
>> >not
>> >> using
>> >> > > >>>>>> increments? Do know remember what was
the reason?
>> >> > > >>>>>>
>> >> > > >>>>>>
>> >> > > >>>>>>>
>> >> > > >>>>>>> The very problem is that before being
started once on
>> >> production
>> >> > > >>>>>>> environment, Ignite will typically
be started hundred
>> >times on
>> >> > > >>>>>> developer's
>> >> > > >>>>>>> environment. I think that default
should be ~10% of
>total
>> >RAM.
>> >> > > >>>>>>>
>> >> > > >>>>>>
>> >> > > >>>>>> Why not 80% of *free *RAM?
>> >> > > >>>>>>
>> >> > > >>>>>>
>> >> > > >>>>>>>
>> >> > > >>>>>>> On Wed, Aug 2, 2017 at 10:21 AM,
Dmitriy Setrakyan <
>> >> > > >>>>>> dsetrakyan@apache.org>
>> >> > > >>>>>>> wrote:
>> >> > > >>>>>>>
>> >> > > >>>>>>>> On Wed, Aug 2, 2017 at 7:27 AM,
Vladimir Ozerov <
>> >> > > >> vozerov@gridgain.com
>> >> > > >>>>>
>> >> > > >>>>>>>> wrote:
>> >> > > >>>>>>>>
>> >> > > >>>>>>>>> Please see original Sergey's
message - when
>persistence
>> >is
>> >> > > enabled,
>> >> > > >>>>>>>> memory
>> >> > > >>>>>>>>> is not allocated incrementally,
maxSize is used.
>> >> > > >>>>>>>>>
>> >> > > >>>>>>>>
>> >> > > >>>>>>>> Why?
>> >> > > >>>>>>>>
>> >> > > >>>>>>>>
>> >> > > >>>>>>>>> Default settings must allow
for normal work on
>> >developer's
>> >> > > >>>>>> environment.
>> >> > > >>>>>>>>>
>> >> > > >>>>>>>>
>> >> > > >>>>>>>> Agree, but why not in increments?
>> >> > > >>>>>>>>
>> >> > > >>>>>>>>
>> >> > > >>>>>>>>>
>> >> > > >>>>>>>>> ср, 2 авг. 2017 г.
в 1:10, Denis Magda
>> ><dmagda@apache.org>:
>> >> > > >>>>>>>>>
>> >> > > >>>>>>>>>>> Why not allocate
in increments automatically?
>> >> > > >>>>>>>>>>
>> >> > > >>>>>>>>>> This is exactly how the
allocation works right now.
>> >The
>> >> memory
>> >> > > >> will
>> >> > > >>>>>>>> grow
>> >> > > >>>>>>>>>> incrementally until the
max size is reached (80% of
>> >RAM by
>> >> > > >>>>>> default).
>> >> > > >>>>>>>>>>
>> >> > > >>>>>>>>>> —
>> >> > > >>>>>>>>>> Denis
>> >> > > >>>>>>>>>>
>> >> > > >>>>>>>>>>> On Aug 1, 2017, at
3:03 PM, dsetrakyan@apache.org
>> >wrote:
>> >> > > >>>>>>>>>>>
>> >> > > >>>>>>>>>>> Vova, 1GB seems a
bit too small for me, and
>frankly i
>> >do
>> >> not
>> >> > > want
>> >> > > >>>>>>> t o
>> >> > > >>>>>>>>>> guess. Why not allocate
in increments
>automatically?
>> >> > > >>>>>>>>>>>
>> >> > > >>>>>>>>>>> ⁣D.​
>> >> > > >>>>>>>>>>>
>> >> > > >>>>>>>>>>> On Aug 1, 2017, 11:03
PM, at 11:03 PM, Vladimir
>> >Ozerov <
>> >> > > >>>>>>>>>> vozerov@gridgain.com>
wrote:
>> >> > > >>>>>>>>>>>> Denis,
>> >> > > >>>>>>>>>>>> No doubts you
haven't heard about it - AI 2.1
>with
>> >> > > persistence,
>> >> > > >>>>>>> when
>> >> > > >>>>>>>>>>>> 80% of
>> >> > > >>>>>>>>>>>> RAM is allocated
right away, was released several
>> >days
>> >> ago.
>> >> > > How
>> >> > > >>>>>> do
>> >> > > >>>>>>>> you
>> >> > > >>>>>>>>>>>> think, how many
users tried it already?
>> >> > > >>>>>>>>>>>>
>> >> > > >>>>>>>>>>>> Guys,
>> >> > > >>>>>>>>>>>> Do you really
think allocating 80% of available
>RAM
>> >is a
>> >> > > normal
>> >> > > >>>>>>>> thing?
>> >> > > >>>>>>>>>>>> Take
>> >> > > >>>>>>>>>>>> your laptop and
check how many available RAM you
>> >have
>> >> right
>> >> > > now.
>> >> > > >>>>>>> Do
>> >> > > >>>>>>>>> you
>> >> > > >>>>>>>>>>>> fit
>> >> > > >>>>>>>>>>>> to remaining
20%? If not, then running AI with
>> >persistence
>> >> > > with
>> >> > > >>>>>>> all
>> >> > > >>>>>>>>>>>> defaults will
bring your machine down. This is
>> >insane. We
>> >> > > shold
>> >> > > >>>>>>>>>>>> allocate no
>> >> > > >>>>>>>>>>>> more than 1Gb,
so that user can play with it
>without
>> >any
>> >> > > >>>>>> problems.
>> >> > > >>>>>>>>>>>>
>> >> > > >>>>>>>>>>>> On Tue, Aug 1,
2017 at 10:26 PM, Denis Magda <
>> >> > > dmagda@apache.org
>> >> > > >>>>>>>
>> >> > > >>>>>>>>> wrote:
>> >> > > >>>>>>>>>>>>
>> >> > > >>>>>>>>>>>>> My vote goes
for option #1 too. I don’t think
>that
>> >80% is
>> >> > too
>> >> > > >>>>>>>>>>>> aggressive
>> >> > > >>>>>>>>>>>>> to bring
it down.
>> >> > > >>>>>>>>>>>>>
>> >> > > >>>>>>>>>>>>> IGNITE-5717
was created to fix the issue of the
>80%
>> >RAM
>> >> > > >>>>>>> allocation
>> >> > > >>>>>>>> on
>> >> > > >>>>>>>>>>>> 64
>> >> > > >>>>>>>>>>>>> bit systems
when Ignite works on top of 32 bit
>JVM.
>> >I’ve
>> >> > not
>> >> > > >>>>>>> heard
>> >> > > >>>>>>>> of
>> >> > > >>>>>>>>>>>> any
>> >> > > >>>>>>>>>>>>> other complaints
in regards the default
>allocation
>> >size.
>> >> > > >>>>>>>>>>>>>
>> >> > > >>>>>>>>>>>>> —
>> >> > > >>>>>>>>>>>>> Denis
>> >> > > >>>>>>>>>>>>>
>> >> > > >>>>>>>>>>>>>> On Aug
1, 2017, at 10:58 AM,
>dsetrakyan@apache.org
>> >> wrote:
>> >> > > >>>>>>>>>>>>>>
>> >> > > >>>>>>>>>>>>>> I prefer
option #1.
>> >> > > >>>>>>>>>>>>>>
>> >> > > >>>>>>>>>>>>>> ⁣D.​
>> >> > > >>>>>>>>>>>>>>
>> >> > > >>>>>>>>>>>>>> On Aug
1, 2017, 11:20 AM, at 11:20 AM, Sergey
>> >Chugunov <
>> >> > > >>>>>>>>>>>>> sergey.chugunov@gmail.com>
wrote:
>> >> > > >>>>>>>>>>>>>>> Folks,
>> >> > > >>>>>>>>>>>>>>>
>> >> > > >>>>>>>>>>>>>>> I
would like to get back to the question about
>> >> > MemoryPolicy
>> >> > > >>>>>>>>>>>> maxMemory
>> >> > > >>>>>>>>>>>>>>> defaults.
>> >> > > >>>>>>>>>>>>>>>
>> >> > > >>>>>>>>>>>>>>> Although
MemoryPolicy may be configured with
>> >initial
>> >> and
>> >> > > >>>>>>>> maxMemory
>> >> > > >>>>>>>>>>>>>>> settings,
when persistence is used
>MemoryPolicy
>> >always
>> >> > > >>>>>>> allocates
>> >> > > >>>>>>>>>>>>>>> maxMemory
>> >> > > >>>>>>>>>>>>>>> size
for performance reasons.
>> >> > > >>>>>>>>>>>>>>>
>> >> > > >>>>>>>>>>>>>>> As
default size of maxMemory is 80% of
>physical
>> >memory
>> >> it
>> >> > > >>>>>>> causes
>> >> > > >>>>>>>>>>>> OOME
>> >> > > >>>>>>>>>>>>>>> exceptions
of 32 bit platforms (either on OS
>or
>> >JVM
>> >> > level)
>> >> > > >>>>>> and
>> >> > > >>>>>>>>>>>> hurts
>> >> > > >>>>>>>>>>>>>>> performance
in setups when multiple Ignite
>nodes
>> >are
>> >> > > started
>> >> > > >>>>>> on
>> >> > > >>>>>>>>>>>> the
>> >> > > >>>>>>>>>>>>>>> same
>> >> > > >>>>>>>>>>>>>>> physical
server.
>> >> > > >>>>>>>>>>>>>>>
>> >> > > >>>>>>>>>>>>>>> I
suggest to rethink these defaults and switch
>to
>> >other
>> >> > > >>>>>>> options:
>> >> > > >>>>>>>>>>>>>>>
>> >> > > >>>>>>>>>>>>>>> -
Check whether platform is 32 or 64 bits and
>> >adapt
>> >> > > defaults.
>> >> > > >>>>>>> In
>> >> > > >>>>>>>>>>>> this
>> >> > > >>>>>>>>>>>>>>> case
we still need to address the issue with
>> >multiple
>> >> > nodes
>> >> > > >>>>>> on
>> >> > > >>>>>>>> one
>> >> > > >>>>>>>>>>>>>>> machine
>> >> > > >>>>>>>>>>>>>>> even
on 64 bit systems.
>> >> > > >>>>>>>>>>>>>>>
>> >> > > >>>>>>>>>>>>>>> -
Lower defaults for maxMemory and allocate,
>for
>> >> > instance,
>> >> > > >>>>>>>>>>>> max(0.3 *
>> >> > > >>>>>>>>>>>>>>> availableMemory,
1Gb).
>> >> > > >>>>>>>>>>>>>>> This
option allows us to solve all issues with
>> >starting
>> >> > on
>> >> > > 32
>> >> > > >>>>>>> bit
>> >> > > >>>>>>>>>>>>>>> platforms
and reduce instability with multiple
>> >nodes on
>> >> > the
>> >> > > >>>>>>> same
>> >> > > >>>>>>>>>>>>>>> machine.
>> >> > > >>>>>>>>>>>>>>>
>> >> > > >>>>>>>>>>>>>>>
>> >> > > >>>>>>>>>>>>>>> Thoughts
and/or other options?
>> >> > > >>>>>>>>>>>>>>>
>> >> > > >>>>>>>>>>>>>>> Thanks,
>> >> > > >>>>>>>>>>>>>>> Sergey.
>> >> > > >>>>>>>>>>>>>
>> >> > > >>>>>>>>>>>>>
>> >> > > >>>>>>>>>>
>> >> > > >>>>>>>>>>
>> >> > > >>>>>>>>>
>> >> > > >>>>>>>>
>> >> > > >>>>>>>
>> >> > > >>>>>>
>> >> > > >>>>
>> >> > > >>>>
>> >> > > >>
>> >> > > >>
>> >> > >
>> >> > >
>> >> >
>> >>
>>

Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message