pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gaurav Jain (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1115) [zebra] temp files are not cleaned.
Date Sat, 06 Feb 2010 00:02:28 GMT

    [ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830383#action_12830383
] 

Gaurav Jain commented on PIG-1115:
----------------------------------

Proposed Solution:

-- Zebra will implement ZebraOutputCommitter

-- Zebra FrontEnd will create all the final directories and schema files 

                    $basicTable/.btschema
                    $basicTable/CG0/.schema
                    $basicTable/CG1/.schema


-- Zebra will create a temporary directory per BasicTable and write all data there during
RecordWrite.write() under

                     $basicTable/_temporary/CG0/part-0000
                     $basicTable/_temporary/CG1/part-0000

-- _temporary directory will always be created under $basicTable

-- In BackEnd, Zebra created RecordWrites which in turn creates CGInserter. CGInserter works
on directory, which we call 'workOutputPath' , 
                                  $basicTable/_temporary/$CG/
             But It needs .schema file which is located 2 levels up. So it reads schema file
from
                                  $basicTable/$workOutputPath.getName()

-- In CGInserter.close(), 
                     $basicTable/_temporary/CG0/part-0000       ----------->          
   $basicTable/CG0/part-0000
-- In ZebraOutputCommitter.cleanupJob(), BasicTableOutputFormat.close() will be called.
-- In BasicTableOutPutFormat.close()
                      remove (                $basicTable/_temporary/               )






> [zebra] temp files are not cleaned.
> -----------------------------------
>
>                 Key: PIG-1115
>                 URL: https://issues.apache.org/jira/browse/PIG-1115
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Hong Tang
>
> Temp files created by zebra during table creation are not cleaned where there is any
task failure, which results in waste of disk space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message