hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amir Youssefi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-129) need to create temp files in the task's working directory
Date Fri, 07 Mar 2008 00:08:58 GMT

    [ https://issues.apache.org/jira/browse/PIG-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12575968#action_12575968

Amir Youssefi commented on PIG-129:

Here is summary of decisions made with Olga:

Goal of this JIRA is to have a means to create a temporary directory under Hadoop Task Dir.
I will open a new JIRA  so others (Pi Song) can continue work on local mode and multiple directories.

 - We address the case in which Hadoop Platform is used.
 - We rely on Hadoop to clean up the directory.
 - We tested this on a cluster and observed logs showing creation of directory and actual
directory/file being generated.
 - Added Code Block is actually called by a synchronized block of code. Second checking of
directory creation is because of an observed case on a cluster. 


> need to create temp files in the task's working directory
> ---------------------------------------------------------
>                 Key: PIG-129
>                 URL: https://issues.apache.org/jira/browse/PIG-129
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Olga Natkovich
>            Assignee: Amir Youssefi
>         Attachments: PIG-129.patch, TempAllocator0.patch
> Currently, pig creates temp data such is spilled bags in the directory specified by java.io.tmpdir.
The problem is that this directory is usually shared by all tasks and can easily run out of
> A better approach would be to create this files in the temp dir inside of the taks working
directory as these locations usually have much mor space and also they can be hosted on different
disks so the performance could be better.
> There are 2 parts to this fix:
> (1) in org.apache.pig.data.DataBag to check if the temp directory exists and create it
if not before trying to create the temp file. This is somewhere around line 390 in the code.
> (2) Change the mapred.child.java.opts in hadoop-site.xml to include new value for tmpdir
property to point to ./tmp. For instance: 
> <property>
>         <name>mapred.child.java.opts</name>
>         <value>-Xmx1024M -Djava.io.tmpdir="./tmp"</value>
>         <description>arguments passed to child jvms</description>
> </property>

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message