apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Atri Sharma <atri.j...@gmail.com>
Subject Re: Window Commits
Date Wed, 26 Aug 2015 02:54:26 GMT
Quick question then.

Just to clarify my understanding, does bufferserver dump all of the data
when full or starts evicting in LRU fashion on demand?
On 26 Aug 2015 03:53, "Chetan Narsude" <chetan@datatorrent.com> wrote:

> I have a hunch that there may be a problem in terms of adding the latency.
> But ultimately we will use benchmark to rule out the hunches if you
> strongly believe in it.
>
> Here is what happens today: bufferserver tries to hold the data in memory
> for as long as possible but not longer than needed. If you do not persist
> the data to memory, you do not have to load it as well as it's already in
> memory. This greatly reduces the disk related latency. Even when we have to
> persist the data, we pick the block (it's pending correct implementation),
> which we will not need back in memory immediately.
>
> The converse of it is presumably true as well. If you start persisting the
> data in anticipation of buffer being full, you will also need to load this
> data back when needed. This will result in frequent round-trips to disk
> adding to the latency.
>
> --
> Chetan
>
>
>
>
> On Tue, Aug 25, 2015 at 12:53 PM, Atri Sharma <atri.jiit@gmail.com> wrote:
>
> > What are the problems you see around loading? I think that it might
> > actually help since we might end up using locality of reference for
> similar
> > data in a single window.
> > On 25 Aug 2015 22:14, "Chetan Narsude" <chetan@datatorrent.com> wrote:
> >
> > > This looks at store side of the equation, what's the impact on the load
> > > side when the time comes to use this data?
> > >
> > > --
> > > Chetan
> > >
> > > On Tue, Aug 25, 2015 at 8:41 AM, Atri Sharma <atri.jiit@gmail.com>
> > wrote:
> > >
> > > > On 25 Aug 2015 10:34, "Vlad Rozov" <v.rozov@datatorrent.com> wrote:
> > > > >
> > > > > I think that the bufferserver should be allowed to use no more than
> > > > application specified amount of memory and behavior like linux file
> > cache
> > > > will make it difficult to allocate operator/container cache without
> > > > reserving too much memory for spikes.
> > > >
> > > > Sure, agreed.
> > > >
> > > > My idea is to use *lesser* memory than what is allocated by
> application
> > > > since I am suggesting some level of control over group commits. So I
> am
> > > > thinking of taking the patch you wrote to have it trigger each time
> > > buffer
> > > > server fills by n units, n being window size.
> > > >
> > > > If n exceed allocated memory, we can error out.
> > > >
> > > > Thoughts?
> > > >
> > > > But I may be wrong and it will be good to have suggested behavior
> > > > implemented in a prototype and benchmark prototype performance.
> > > > >
> > > > > Vlad
> > > > >
> > > > >
> > > > > On 8/24/15 18:24, Atri Sharma wrote:
> > > > >>
> > > > >> The idea is that if bufferserver dumps *all* pages once it runs
> out
> > of
> > > > >> memory, then it's a huge I/O spike. If it starts paging out once
> it
> > > runs
> > > > >> out of memory,  then it behaves like a normal cache and further
> > level
> > > of
> > > > >> paging control can be applied.
> > > > >>
> > > > >> My idea is that there should be functionality to control the
> amount
> > of
> > > > data
> > > > >> that is committed together. This also allows me to 1) define
> optimal
> > > way
> > > > >> writes work on my disk 2) allow my application to define locality
> of
> > > > data.
> > > > >> For eg I might be performing graph analysis in which a time
> window's
> > > > data
> > > > >> consists of sub graph.
> > > > >> On 25 Aug 2015 02:46, "Chetan Narsude" <chetan@datatorrent.com>
> > > wrote:
> > > > >>
> > > > >>> The bufferserver writes pages to disk *only when* it runs
out of
> > > memory
> > > > to
> > > > >>> hold them.
> > > > >>>
> > > > >>> Can you elaborate where you see I/O spikes?
> > > > >>>
> > > > >>> --
> > > > >>> Chetan
> > > > >>>
> > > > >>> On Mon, Aug 24, 2015 at 12:39 PM, Atri Sharma <
> atri.jiit@gmail.com
> > >
> > > > wrote:
> > > > >>>
> > > > >>>> Folks,
> > > > >>>>
> > > > >>>> I was wondering if it makes sense to have a functionality
in
> which
> > > > >>>> bufferserver writes out data pages to disk in batches
defined by
> > > > >>>> timeslice/application window.
> > > > >>>>
> > > > >>>> This will allow flexible workloads and reduce I/O spikes
(I
> > > understand
> > > > >>>
> > > > >>> that
> > > > >>>>
> > > > >>>> we have non-blocking I/O but it still would incur disk
head
> > costs).
> > > > >>>>
> > > > >>>> Thoughts?
> > > > >>>>
> > > > >>>> --
> > > > >>>> Regards,
> > > > >>>>
> > > > >>>> Atri
> > > > >>>> *l'apprenant*
> > > > >>>>
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message