apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "chandni@datatorrent.com" <chan...@datatorrent.com>
Subject Re: Stray folders under Malhar/lib
Date Fri, 04 Sep 2015 01:01:41 GMT
Chetan,

I think you have mis understood my comments. Since I proposed a change in StramLocalCluster
which is in platform it is quite evident that I do not want the application test developer
to be concerned about StorageAgent.

I dont see a use of async writer in local mode. In fact 2 step writes will only make the tests
slower.

Chandni

----- Reply message -----
From: "Chetan Narsude" <chetan@datatorrent.com>
To: <dev@apex.incubator.apache.org>
Subject: Stray folders under Malhar/lib
Date: Thu, Sep 3, 2015 5:13 PM

The problem is the same as BufferServer (BS) needing to write the temp
files. The BS does not write the temp files in stray locations even though
each invocation of StramLocalCluster triggers BS initialization. It checks
with the context for a temporary location automatically to decide a
temporary file location and uses it.

The  AsyncFSStorageAgent needs to follow the exact same pattern now like
bufferserver to write
to the temp location. So while writing the tests one does not have to
specify special location or the code platform executes does not have to be
different in test mode and in the prod mode. So both Thomas and Ram get
what they want.

Chandni - the StorageAgent is an abstraction. The test developer should not
care about whether it's doing async or sync. That should be handled by
context and ideally the code that executes in both the cases should be the
same. Practically you will have some differences. Needless to say the fewer
the better.

--
Chetan


The context decides the behavior of the code. Gaurav is almost done with
the
On Thu, Sep 3, 2015 at 3:51 PM, Munagala Ramanath <ram@datatorrent.com>
wrote:

> Yes Thomas, when I run my app with *dtcli *I'm seeing all those stray
> directories.
>
> I agree that re-configuring a whole bunch of existing tests is not
> something we want to do.
>
> The key question seems to be: *Can we do something in the platform so that
> both objectives are achieved*, namely:
> (a) Existing unit tests need no re-configuration.
> (b) Neither running unit tests nor running a user application via dtcli
> leaves random directories lying around.
>
> I don't yet know enough about the platform to answer that question but it
> seems like there should be something
> we can do that is not too onerous.
>
> Ram
>
> On Thu, Sep 3, 2015 at 3:35 PM, Thomas Weise <thomas@datatorrent.com>
> wrote:
>
> > Ram,
> >
> > Are you referring to running your app from the dtcli?
> >
> > That's one more item to check to not end up with stray files. There is
> > nothing stopping us from using a different default for that use case. My
> > take is that we should structure this in a way where by default unit
> tests
> > have minimum things to configure and run as fast as reasonably possible.
> >
> > Thomas
> >
> > On Thu, Sep 3, 2015 at 3:19 PM, Munagala Ramanath <ram@datatorrent.com>
> > wrote:
> >
> > > It's not just unit tests.
> > >
> > > An app developer is likely to run a random app in LM to uncover bugs
> > before
> > > hitting the cluster.
> > > The closer the LM setup is to the cluster setup (i.e. running as much
> of
> > > the same code as reasonably possible)
> > > the higher the probability that bugs will be hit in LM.
> > >
> > > Ram
> > >
> > > On Thu, Sep 3, 2015 at 3:13 PM, Chandni Singh <chandni@datatorrent.com
> >
> > > wrote:
> > >
> > > > Yes Chetan, I am claiming that :-)
> > > >
> > > > I still don't understand the need for having two step checkpointing
> in
> > > > LocalMode by default.
> > > >
> > > > StramLocalCluster should simplify test execution environment as
> pointed
> > > out
> > > > by Thomas.
> > > > Async checkpoint should have its own test cases using
> StramLocalCluster
> > > > that should not break when new features are added to the platform.
> > > > But by default I think StramLocalCluster should use synchronous
> > > > checkpointing.
> > > >
> > > > -Chandni
> > > >
> > > > On Thu, Sep 3, 2015 at 2:43 PM, Chetan Narsude <
> chetan@datatorrent.com
> > >
> > > > wrote:
> > > >
> > > > > Changing the storage agent is one of the ways to address the
> symptoms
> > > of
> > > > > the problem. But it's not treating the problem.
> > > > >
> > > > > In this case - change the basePath to a location under target and
> all
> > > the
> > > > > opinions are moot. And someone is claiming that we should not do
> it.
> > > Not
> > > > > sure why. Or is anyone claiming that anymore?
> > > > >
> > > > > --
> > > > > Chetan
> > > > >
> > > > > On Thu, Sep 3, 2015 at 1:43 PM, Thomas Weise <
> thomas@datatorrent.com
> > >
> > > > > wrote:
> > > > >
> > > > >> Good point regarding the coverage. These JUnit tests are supposed
> to
> > > > test
> > > > >> individual components and all the tests collectively should strive
> > to
> > > > >> achieve high coverage. There are tests in Apex to cover storage
> > > agents,
> > > > >> recovery semantics etc.  Components that fall outside of the
test
> > > scope
> > > > >> are
> > > > >> reduced as much as possible through mocks (even though there
is
> room
> > > for
> > > > >> improvement).
> > > > >>
> > > > >> The tests in Malhar are for operators and applications, not for
> the
> > > > >> engine.
> > > > >> In those cases where LM is used, the intention is to test the
> > > > application
> > > > >> functionality. It is expected that certain configurations are
> > adjusted
> > > > for
> > > > >> the test and dependencies mocked.
> > > > >>
> > > > >> For the local mode, it should not be an issue to use a different
> > > storage
> > > > >> agent when it simplifies the test execution. Specifically, in
this
> > > case,
> > > > >> we
> > > > >> don't want to go and change many tests to make something work
that
> > > isn't
> > > > >> needed. LM is not "production", it is not using HDFS and there
> are a
> > > > >> number
> > > > >> of other important differences that make it possible to run within
> > the
> > > > >> IDE.
> > > > >>
> > > > >> Instead, focus should be on those things that help with app
> > coverage.
> > > > For
> > > > >> example, in the past we had seen issues with serialization of
> > > operators
> > > > >> that were not uncovered in LM, until we made the serialization
> part
> > of
> > > > the
> > > > >> execution, even when it was not needed for execution.
> > > > >>
> > > > >> Thomas
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Thu, Sep 3, 2015 at 11:21 AM, Chetan Narsude <
> > > chetan@datatorrent.com
> > > > >
> > > > >> wrote:
> > > > >>
> > > > >> > I think Ram explained in a little more detail on what I
am
> > thinking.
> > > > >> >
> > > > >> > Tests are supposed to provide code coverage. Having localcluster
> > is
> > > > >> already
> > > > >> > a variable, it's not what runs in production. Having a different
> > > > storage
> > > > >> > agent is another variable and it misses out on testing the
> > > > asynchronous
> > > > >> > flow. The gap keeps on increasing if we continue to do that.
> > > > AsyncFSSA
> > > > >> is
> > > > >> > our default because it's supposed to do everything that
> > > FSStorageAgent
> > > > >> does
> > > > >> > and some more. So not clear as to why the test which creates
> stray
> > > > >> folders
> > > > >> > is not configuring the storage agent properly instead of
> > completely
> > > > >> > changing it out which brings some other problems in as I
just
> > > > explained.
> > > > >> >
> > > > >> > If changing the storage agent is the only way to fix the
problem
> > > with
> > > > >> > reasonable effort, then I would concede. I highly doubt
that.
> > > > >> >
> > > > >> > --
> > > > >> > Chetan
> > > > >> >
> > > > >> > On Thu, Sep 3, 2015 at 11:05 AM, Chandni Singh <
> > > > chandni@datatorrent.com
> > > > >> >
> > > > >> > wrote:
> > > > >> >
> > > > >> > > The local mode was so far using FSStorageAgent which
was used
> in
> > > > >> > > production.
> > > > >> > > In production using Async is needed because hdfs writes
are
> slow
> > > but
> > > > >> is
> > > > >> > > that the case with LocalMode?
> > > > >> > >
> > > > >> > > In local mode if we use Async we are creating checkpoints
> under
> > > one
> > > > >> local
> > > > >> > > directory and then copying it to another local directory
which
> > > will
> > > > >> not
> > > > >> > > improve any performance.
> > > > >> > >
> > > > >> > > In my opinion StramLocalCluster use synchronous checkpointing
> as
> > > > >> default.
> > > > >> > >
> > > > >> > > Chandni
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > On Thu, Sep 3, 2015 at 10:09 AM, Chetan Narsude <
> > > > >> chetan@datatorrent.com>
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > >> That sounds a lot like self contradicting reason;
Let's make
> a
> > > > change
> > > > >> > >> because we don't want to make change. :-)
> > > > >> > >>
> > > > >> > >> The code is in certain state. This certain state
is
> consistent
> > > with
> > > > >> how
> > > > >> > >> things run in production. In test environment there
is a
> > problem
> > > > that
> > > > >> > stray
> > > > >> > >> files are created. It's a small fix to relocate
these files
> > > > >> elsewhere.
> > > > >> > What
> > > > >> > >> I am trying to understand is that is not being
done?
> > > > >> > >>
> > > > >> > >> --
> > > > >> > >> Chetan
> > > > >> > >>
> > > > >> > >> On Thu, Sep 3, 2015 at 9:41 AM, Thomas Weise <
> > > > thomas@datatorrent.com
> > > > >> >
> > > > >> > >> wrote:
> > > > >> > >>
> > > > >> > >>> There is no need to configure anything extra
with the
> proposed
> > > > >> change,
> > > > >> > it
> > > > >> > >>> just brings back LM to how it worked before.
> > > > >> > >>>
> > > > >> > >>> There is no point modifying n tests for extra
setup with no
> > > gain.
> > > > >> > >>>
> > > > >> > >>> Thomas
> > > > >> > >>>
> > > > >> > >>> On Thu, Sep 3, 2015 at 9:14 AM, Chetan Narsude
<
> > > > >> chetan@datatorrent.com
> > > > >> > >
> > > > >> > >>> wrote:
> > > > >> > >>>
> > > > >> > >>> > Why does it matter that AsyncFSStorageAgent
is being used
> > with
> > > > >> > >>> > LocalCluster? It using the localfs and
hence no gain is
> the
> > > > >> > >>> implementation
> > > > >> > >>> > detail that's abstracted out by FileSystem
already.
> > > > >> > >>> >
> > > > >> > >>> > If there is a problem of random artifacts
left behind
> after
> > > the
> > > > >> test,
> > > > >> > >>> there
> > > > >> > >>> > is a reason and most likely it's misconfiguration
of the
> > > > >> > StorageAgent.
> > > > >> > >>> Why
> > > > >> > >>> > wouldn't that be fixed.
> > > > >> > >>> >
> > > > >> > >>> > --
> > > > >> > >>> > Chetan
> > > > >> > >>> >
> > > > >> > >>> >
> > > > >> > >>> > On Thu, Sep 3, 2015 at 8:59 AM, Amol Kekre
<
> > > > amol@datatorrent.com>
> > > > >> > >>> wrote:
> > > > >> > >>> >
> > > > >> > >>> > > Clean up container files left over
should be a
> distributed
> > > OS
> > > > >> task.
> > > > >> > >>> Clean
> > > > >> > >>> > > up, back up, archive, ... all is
for the OS (aka YARN).
> We
> > > > must
> > > > >> > >>> assume
> > > > >> > >>> > kill
> > > > >> > >>> > > -9.
> > > > >> > >>> > >
> > > > >> > >>> > > The only thing where the operator
comes into play is
> > > > >> "teardown()",
> > > > >> > >>> which
> > > > >> > >>> > is
> > > > >> > >>> > > business logic (not Apex engine)
issue. This could be db
> > > > >> connection
> > > > >> > >>> etc.
> > > > >> > >>> > >
> > > > >> > >>> > > Thks,
> > > > >> > >>> > > Amol
> > > > >> > >>> > >
> > > > >> > >>> > > On Thu, Sep 3, 2015 at 8:52 AM, Thomas
Weise <
> > > > >> > thomas@datatorrent.com
> > > > >> > >>> >
> > > > >> > >>> > > wrote:
> > > > >> > >>> > >
> > > > >> > >>> > > > When the container gets killed,
we should not assume
> > > > anything
> > > > >> > about
> > > > >> > >>> > > > cleanup. It can be a kill -9.
Any related "cleanup"
> > falls
> > > > >> under
> > > > >> > >>> nice to
> > > > >> > >>> > > > have, no guarantees.
> > > > >> > >>> > > >
> > > > >> > >>> > > > On Thu, Sep 3, 2015 at 8:49
AM, Chandni Singh <
> > > > >> > >>> chandni@datatorrent.com
> > > > >> > >>> > >
> > > > >> > >>> > > > wrote:
> > > > >> > >>> > > >
> > > > >> > >>> > > > > I have a question regarding
what Gaurav mentioned
> > > > >> > >>> > > > > ----
> > > > >> > >>> > > > > When container runs in
cluster, "." specifies the
> > > > containers
> > > > >> > >>> local
> > > > >> > >>> > path
> > > > >> > >>> > > > on
> > > > >> > >>> > > > > the node where container
specific jars and other
> > > resources
> > > > >> > >>> resides.
> > > > >> > >>> > It
> > > > >> > >>> > > > > creates a folder under
that which is live as long as
> > > > >> container
> > > > >> > >>> lives.
> > > > >> > >>> > > So
> > > > >> > >>> > > > > there are no vagrant folders
anywhere
> > > > >> > >>> > > > > ---
> > > > >> > >>> > > > >
> > > > >> > >>> > > > > When the container gets
killed, do we cleanup the
> > > folders
> > > > >> > >>> created by
> > > > >> > >>> > > > Async
> > > > >> > >>> > > > > under the containers working
dir?
> > > > >> > >>> > > > >
> > > > >> > >>> > > > > On Thu, Sep 3, 2015 at
8:42 AM, Thomas Weise <
> > > > >> > >>> thomas@datatorrent.com
> > > > >> > >>> > >
> > > > >> > >>> > > > > wrote:
> > > > >> > >>> > > > >
> > > > >> > >>> > > > >> It makes sense to use
the synchronous checkpointing
> > for
> > > > the
> > > > >> > >>> local
> > > > >> > >>> > > mode.
> > > > >> > >>> > > > >> LM is meant to simplify
dependencies and setup. The
> > > > default
> > > > >> > for
> > > > >> > >>> > > > execution
> > > > >> > >>> > > > >> on YARN remains async.
> > > > >> > >>> > > > >>
> > > > >> > >>> > > > >> Thomas
> > > > >> > >>> > > > >>
> > > > >> > >>> > > > >>
> > > > >> > >>> > > > >> On Thu, Sep 3, 2015
at 8:34 AM, Chandni Singh <
> > > > >> > >>> > > chandni@datatorrent.com>
> > > > >> > >>> > > > >> wrote:
> > > > >> > >>> > > > >>
> > > > >> > >>> > > > >>> APPLICATION_PATH
isn't related to local base dir
> of
> > > > Async
> > > > >> as
> > > > >> > >>> far
> > > > >> > >>> > as I
> > > > >> > >>> > > > >>> know. StramLocalCluster
sets the APP_PATH to
> > > > "target/...".
> > > > >> > >>> > > > >>> StramLocalCluster
should use FSStorageAgent.
> > > > >> > >>> > > > >>>
> > > > >> > >>> > > > >>> - Chandni
> > > > >> > >>> > > > >>>
> > > > >> > >>> > > > >>> On Thu, Sep 3,
2015 at 8:20 AM, Gaurav Gupta <
> > > > >> > >>> > gaurav@datatorrent.com
> > > > >> > >>> > > >
> > > > >> > >>> > > > >>> wrote:
> > > > >> > >>> > > > >>>
> > > > >> > >>> > > > >>>> As Thomas mentioned
as default remains to be
> async.
> > > You
> > > > >> can
> > > > >> > >>> either
> > > > >> > >>> > > > >>>> change the
storage agent or set the
> > APPLICATION_PATH.
> > > > >> > >>> > > > >>>>
> > > > >> > >>> > > > >>>> When container
runs in cluster, "." specifies the
> > > > >> containers
> > > > >> > >>> local
> > > > >> > >>> > > > path
> > > > >> > >>> > > > >>>> on the node
where container specific jars and
> other
> > > > >> > resources
> > > > >> > >>> > > > resides. It
> > > > >> > >>> > > > >>>> creates a folder
under that which is live as long
> > as
> > > > >> > container
> > > > >> > >>> > > lives.
> > > > >> > >>> > > > So
> > > > >> > >>> > > > >>>> there are no
vagrant folders anywhere
> > > > >> > >>> > > > >>>>
> > > > >> > >>> > > > >>>> Thanks
> > > > >> > >>> > > > >>>> -Gaurav
> > > > >> > >>> > > > >>>>
> > > > >> > >>> > > > >>>> On Wed, Sep
2, 2015 at 11:33 PM, Chandni Singh <
> > > > >> > >>> > > > chandni@datatorrent.com
> > > > >> > >>> > > > >>>> > wrote:
> > > > >> > >>> > > > >>>>
> > > > >> > >>> > > > >>>>> I think
there is a problem in the default Async
> as
> > > > >> well. It
> > > > >> > >>> also
> > > > >> > >>> > > uses
> > > > >> > >>> > > > >>>>> the working
directory as its local base path.
> > > > >> > >>> > > > >>>>>
> > > > >> > >>> > > > >>>>> In the
Async -> copyToHdfs()  method, we delete
> > the
> > > > >> window
> > > > >> > >>> files
> > > > >> > >>> > > but
> > > > >> > >>> > > > >>>>> the folder
with the operator name never gets
> > > deleted.
> > > > >> > >>> > > > >>>>> So on the
cluster there  will be such vagrant
> > > folders
> > > > in
> > > > >> > the
> > > > >> > >>> > > working
> > > > >> > >>> > > > >>>>> directory?
> > > > >> > >>> > > > >>>>>
> > > > >> > >>> > > > >>>>> On Wed,
Sep 2, 2015 at 11:17 PM, Thomas Weise <
> > > > >> > >>> > > > thomas@datatorrent.com>
> > > > >> > >>> > > > >>>>> wrote:
> > > > >> > >>> > > > >>>>>
> > > > >> > >>> > > > >>>>>> Chandni,
> > > > >> > >>> > > > >>>>>>
> > > > >> > >>> > > > >>>>>> Agreed.
See whether the tests work with the
> > > > synchronous
> > > > >> > >>> storage
> > > > >> > >>> > > > >>>>>> agent.
If yes, change them. The default needs
> to
> > > > remain
> > > > >> > >>> async.
> > > > >> > >>> > > > >>>>>>
> > > > >> > >>> > > > >>>>>> Thomas
> > > > >> > >>> > > > >>>>>>
> > > > >> > >>> > > > >>>>>>
> > > > >> > >>> > > > >>>>>> On
Wed, Sep 2, 2015 at 11:05 PM, Chandni Singh
> <
> > > > >> > >>> > > > >>>>>> chandni@datatorrent.com>
wrote:
> > > > >> > >>> > > > >>>>>>
> > > > >> > >>> > > > >>>>>>>
Hi,
> > > > >> > >>> > > > >>>>>>>
> > > > >> > >>> > > > >>>>>>>
I would like to know what was the reason to
> use
> > > > >> > >>> > > AsyncFSStorageAgent
> > > > >> > >>> > > > >>>>>>>
with StramLocalCluster?
> > > > >> > >>> > > > >>>>>>>
StramLocalCluster is mainly for testing in a
> > > > >> > >>> non-distributed
> > > > >> > >>> > mode
> > > > >> > >>> > > > >>>>>>>
and I am unclear how AsyncFSStorageAgent is
> > > helpful
> > > > in
> > > > >> > this
> > > > >> > >>> > mode.
> > > > >> > >>> > > > >>>>>>>
> > > > >> > >>> > > > >>>>>>>
Thanks,
> > > > >> > >>> > > > >>>>>>>
Chandni
> > > > >> > >>> > > > >>>>>>>
> > > > >> > >>> > > > >>>>>>>
On Wed, Sep 2, 2015 at 10:45 PM, Chandni
> Singh <
> > > > >> > >>> > > > >>>>>>>
chandni@datatorrent.com> wrote:
> > > > >> > >>> > > > >>>>>>>
> > > > >> > >>> > > > >>>>>>>>
This is because of recent changes to
> > > > >> StramLocalCluster
> > > > >> > >>> where
> > > > >> > >>> > > > >>>>>>>>
AsyncFSStorageAgent is used for checkpointing
> > > > >> > >>> > > > >>>>>>>>
> > > > >> > >>> > > > >>>>>>>>
> dag.setAttribute(OperatorContext.STORAGE_AGENT,
> > > new
> > > > >> > >>> > > > AsyncFSStorageAgent(new Path(pathUri,
> > > > >> > >>> > > > LogicalPlan.SUBDIR_CHECKPOINTS).toString(),
null));
> > > > >> > >>> > > > >>>>>>>>
> > > > >> > >>> > > > >>>>>>>>
The AsyncFSStorageAgent(String path,
> > > Configuration
> > > > >> conf)
> > > > >> > >>> uses
> > > > >> > >>> > > "."
> > > > >> > >>> > > > as localBasePath and therefore
creates sub-directories
> > per
> > > > >> > >>> operator in
> > > > >> > >>> > > the
> > > > >> > >>> > > > current working directory.
> > > > >> > >>> > > > >>>>>>>>
> > > > >> > >>> > > > >>>>>>>>
I am going to create a ticket to address this
> > and
> > > > >> will
> > > > >> > >>> fix it.
> > > > >> > >>> > > > >>>>>>>>
> > > > >> > >>> > > > >>>>>>>>
-Chandni
> > > > >> > >>> > > > >>>>>>>>
> > > > >> > >>> > > > >>>>>>>>
> > > > >> > >>> > > > >>>>>>>>
On Wed, Sep 2, 2015 at 7:13 PM, Chandni
> Singh <
> > > > >> > >>> > > > >>>>>>>>
chandni@datatorrent.com> wrote:
> > > > >> > >>> > > > >>>>>>>>
> > > > >> > >>> > > > >>>>>>>>>
Hi,
> > > > >> > >>> > > > >>>>>>>>>
> > > > >> > >>> > > > >>>>>>>>>
I can see empty folders getting created
> under
> > > > >> > Malhar/lib
> > > > >> > >>> > called
> > > > >> > >>> > > > >>>>>>>>>
'1' and '2'.
> > > > >> > >>> > > > >>>>>>>>>
I think this is because of using LocalMode
> to
> > > run
> > > > a
> > > > >> > test
> > > > >> > >>> > > > >>>>>>>>>
application.
> > > > >> > >>> > > > >>>>>>>>>
> > > > >> > >>> > > > >>>>>>>>>
> > > > >> > >>> > > > >>>>>>>>>
If anyone has checked in such cases please
> do
> > > > check
> > > > >> and
> > > > >> > >>> let
> > > > >> > >>> > us
> > > > >> > >>> > > > >>>>>>>>>
know.
> > > > >> > >>> > > > >>>>>>>>>
> > > > >> > >>> > > > >>>>>>>>>
Thanks,
> > > > >> > >>> > > > >>>>>>>>>
Chandni
> > > > >> > >>> > > > >>>>>>>>>
> > > > >> > >>> > > > >>>>>>>>
> > > > >> > >>> > > > >>>>>>>>
> > > > >> > >>> > > > >>>>>>>
> > > > >> > >>> > > > >>>>>>
> > > > >> > >>> > > > >>>>>
> > > > >> > >>> > > > >>>>
> > > > >> > >>> > > > >>>
> > > > >> > >>> > > > >>
> > > > >> > >>> > > > >
> > > > >> > >>> > > >
> > > > >> > >>> > >
> > > > >> > >>> >
> > > > >> > >>>
> > > > >> > >>
> > > > >> > >>
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message