apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Atri Sharma <atri.j...@gmail.com>
Subject Re: Window Commits
Date Wed, 26 Aug 2015 16:00:14 GMT
Super. Then I feel the ability to batch blocks for a single commit makes
sense.

BTW, where can I find the code for cache eviction please?
On 26 Aug 2015 20:03, "Chetan Narsude" <chetan@datatorrent.com> wrote:

> Lru fashion on demand. One block at a time.
> On Aug 25, 2015 7:54 PM, "Atri Sharma" <atri.jiit@gmail.com> wrote:
>
> > Quick question then.
> >
> > Just to clarify my understanding, does bufferserver dump all of the data
> > when full or starts evicting in LRU fashion on demand?
> > On 26 Aug 2015 03:53, "Chetan Narsude" <chetan@datatorrent.com> wrote:
> >
> > > I have a hunch that there may be a problem in terms of adding the
> > latency.
> > > But ultimately we will use benchmark to rule out the hunches if you
> > > strongly believe in it.
> > >
> > > Here is what happens today: bufferserver tries to hold the data in
> memory
> > > for as long as possible but not longer than needed. If you do not
> persist
> > > the data to memory, you do not have to load it as well as it's already
> in
> > > memory. This greatly reduces the disk related latency. Even when we
> have
> > to
> > > persist the data, we pick the block (it's pending correct
> > implementation),
> > > which we will not need back in memory immediately.
> > >
> > > The converse of it is presumably true as well. If you start persisting
> > the
> > > data in anticipation of buffer being full, you will also need to load
> > this
> > > data back when needed. This will result in frequent round-trips to disk
> > > adding to the latency.
> > >
> > > --
> > > Chetan
> > >
> > >
> > >
> > >
> > > On Tue, Aug 25, 2015 at 12:53 PM, Atri Sharma <atri.jiit@gmail.com>
> > wrote:
> > >
> > > > What are the problems you see around loading? I think that it might
> > > > actually help since we might end up using locality of reference for
> > > similar
> > > > data in a single window.
> > > > On 25 Aug 2015 22:14, "Chetan Narsude" <chetan@datatorrent.com>
> wrote:
> > > >
> > > > > This looks at store side of the equation, what's the impact on the
> > load
> > > > > side when the time comes to use this data?
> > > > >
> > > > > --
> > > > > Chetan
> > > > >
> > > > > On Tue, Aug 25, 2015 at 8:41 AM, Atri Sharma <atri.jiit@gmail.com>
> > > > wrote:
> > > > >
> > > > > > On 25 Aug 2015 10:34, "Vlad Rozov" <v.rozov@datatorrent.com>
> > wrote:
> > > > > > >
> > > > > > > I think that the bufferserver should be allowed to use
no more
> > than
> > > > > > application specified amount of memory and behavior like linux
> file
> > > > cache
> > > > > > will make it difficult to allocate operator/container cache
> without
> > > > > > reserving too much memory for spikes.
> > > > > >
> > > > > > Sure, agreed.
> > > > > >
> > > > > > My idea is to use *lesser* memory than what is allocated by
> > > application
> > > > > > since I am suggesting some level of control over group commits.
> So
> > I
> > > am
> > > > > > thinking of taking the patch you wrote to have it trigger each
> time
> > > > > buffer
> > > > > > server fills by n units, n being window size.
> > > > > >
> > > > > > If n exceed allocated memory, we can error out.
> > > > > >
> > > > > > Thoughts?
> > > > > >
> > > > > > But I may be wrong and it will be good to have suggested behavior
> > > > > > implemented in a prototype and benchmark prototype performance.
> > > > > > >
> > > > > > > Vlad
> > > > > > >
> > > > > > >
> > > > > > > On 8/24/15 18:24, Atri Sharma wrote:
> > > > > > >>
> > > > > > >> The idea is that if bufferserver dumps *all* pages
once it
> runs
> > > out
> > > > of
> > > > > > >> memory, then it's a huge I/O spike. If it starts paging
out
> once
> > > it
> > > > > runs
> > > > > > >> out of memory,  then it behaves like a normal cache
and
> further
> > > > level
> > > > > of
> > > > > > >> paging control can be applied.
> > > > > > >>
> > > > > > >> My idea is that there should be functionality to control
the
> > > amount
> > > > of
> > > > > > data
> > > > > > >> that is committed together. This also allows me to
1) define
> > > optimal
> > > > > way
> > > > > > >> writes work on my disk 2) allow my application to define
> > locality
> > > of
> > > > > > data.
> > > > > > >> For eg I might be performing graph analysis in which
a time
> > > window's
> > > > > > data
> > > > > > >> consists of sub graph.
> > > > > > >> On 25 Aug 2015 02:46, "Chetan Narsude" <
> chetan@datatorrent.com>
> > > > > wrote:
> > > > > > >>
> > > > > > >>> The bufferserver writes pages to disk *only when*
it runs out
> > of
> > > > > memory
> > > > > > to
> > > > > > >>> hold them.
> > > > > > >>>
> > > > > > >>> Can you elaborate where you see I/O spikes?
> > > > > > >>>
> > > > > > >>> --
> > > > > > >>> Chetan
> > > > > > >>>
> > > > > > >>> On Mon, Aug 24, 2015 at 12:39 PM, Atri Sharma <
> > > atri.jiit@gmail.com
> > > > >
> > > > > > wrote:
> > > > > > >>>
> > > > > > >>>> Folks,
> > > > > > >>>>
> > > > > > >>>> I was wondering if it makes sense to have a
functionality in
> > > which
> > > > > > >>>> bufferserver writes out data pages to disk
in batches
> defined
> > by
> > > > > > >>>> timeslice/application window.
> > > > > > >>>>
> > > > > > >>>> This will allow flexible workloads and reduce
I/O spikes (I
> > > > > understand
> > > > > > >>>
> > > > > > >>> that
> > > > > > >>>>
> > > > > > >>>> we have non-blocking I/O but it still would
incur disk head
> > > > costs).
> > > > > > >>>>
> > > > > > >>>> Thoughts?
> > > > > > >>>>
> > > > > > >>>> --
> > > > > > >>>> Regards,
> > > > > > >>>>
> > > > > > >>>> Atri
> > > > > > >>>> *l'apprenant*
> > > > > > >>>>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message