hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pi Song (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-129) need to create temp files in the task's working directory
Date Sat, 01 Mar 2008 00:05:51 GMT

    [ https://issues.apache.org/jira/browse/PIG-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12574019#action_12574019
] 

Pi Song commented on PIG-129:
-----------------------------

I think the concept of multi-dir temp file creator (LocalDirAllocator in Hadoop) should be
adopted to Pig.  What it does is:-
- You can set up a set of tmp file dirs in configuration (They can be on different physical
drives so you can utilize more disk space)
- When a temp file is being created, the system will probe the given temp dirs in round-robin
fashion
- For a selected temp dir, if it exists and you have permission to write, temp file will be
created
- For a selected temp dir, it it doesn't exist or you don't have permission to write, the
temp dir will be kept in the black list, thus not being used later on.
- For the next temp file, move on to the next temp dir

> need to create temp files in the task's working directory
> ---------------------------------------------------------
>
>                 Key: PIG-129
>                 URL: https://issues.apache.org/jira/browse/PIG-129
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Olga Natkovich
>            Assignee: Amir Youssefi
>
> Currently, pig creates temp data such is spilled bags in the directory specified by java.io.tmpdir.
The problem is that this directory is usually shared by all tasks and can easily run out of
space.
> A better approach would be to create this files in the temp dir inside of the taks working
directory as these locations usually have much mor space and also they can be hosted on different
disks so the performance could be better.
> There are 2 parts to this fix:
> (1) in org.apache.pig.data.DataBag to check if the temp directory exists and create it
if not before trying to create the temp file. This is somewhere around line 390 in the code.
> (2) Change the mapred.child.java.opts in hadoop-site.xml to include new value for tmpdir
property to point to ./tmp. For instance: 
> <property>
>         <name>mapred.child.java.opts</name>
>         <value>-Xmx1024M -Djava.io.tmpdir="./tmp"</value>
>         <description>arguments passed to child jvms</description>
> </property>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message