crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Whitacre (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-304) MRPipeline.plan() does not clear up the temporary hadoop-<username> folder it creates
Date Tue, 26 Nov 2013 17:01:49 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13832761#comment-13832761
] 

Micah Whitacre commented on CRUNCH-304:
---------------------------------------

Actually I kind of wonder if the cleanup is an implementation detail and we should hide the
cleanup behind the concept of done() in someway.  There isn't clear documentation for consumers
on if they should use run() or done() or the difference between the two.  So I wonder if we
should instead try to make the standard that done() should always be called on a pipeline
to make sure stuff in cleaned up but that people can opt into run() or plan() if they want.
 To support this passively the force option would probably be added to the done() method instead.

> MRPipeline.plan() does not clear up the temporary hadoop-<username> folder it creates
> -------------------------------------------------------------------------------------
>
>                 Key: CRUNCH-304
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-304
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hadoop 2.0.0-cdh4.2.1
> Subversion file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hadoop-2.0.0-cdh4.2.1/src/hadoop-common-project/hadoop-common
-r 144bd548d481c2774fab2bec2ac2645d190f705b
> Compiled by jenkins on Mon Apr 22 10:26:03 PDT 2013
> From source with checksum aef88defdddfb22327a107fbd7063395
>            Reporter: Ganeshbabu Nelamangala
>            Assignee: Josh Wills
>            Priority: Minor
>              Labels: easyfix, newbie, patch
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> As a user i just want to run MRPipeline.plan() to retrieve the PlanningParameters.PIPELINE_PLAN_DOTFILE
for the current pipeline configuration. However it appears that since we don't actually call
run(), the Map that is created to hold outputTargets still has items in it, and the cleanup()
method will not execute under the condition of having elements in it, to delete the tmp directory
that gets created when creating a new MRPipeline object. Since we don't really want to execute
the code if we just want to create a plan then I don't know how we can cleanup this Map. Basically
the temporary hadoop folders left behind is our problem.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message