pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gaurav Jain (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1115) [zebra] temp files are not cleaned.
Date Sat, 06 Feb 2010 00:02:28 GMT

    [ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830383#action_12830383

Gaurav Jain commented on PIG-1115:

Proposed Solution:

-- Zebra will implement ZebraOutputCommitter

-- Zebra FrontEnd will create all the final directories and schema files 


-- Zebra will create a temporary directory per BasicTable and write all data there during
RecordWrite.write() under


-- _temporary directory will always be created under $basicTable

-- In BackEnd, Zebra created RecordWrites which in turn creates CGInserter. CGInserter works
on directory, which we call 'workOutputPath' , 
             But It needs .schema file which is located 2 levels up. So it reads schema file

-- In CGInserter.close(), 
                     $basicTable/_temporary/CG0/part-0000       ----------->          
-- In ZebraOutputCommitter.cleanupJob(), BasicTableOutputFormat.close() will be called.
-- In BasicTableOutPutFormat.close()
                      remove (                $basicTable/_temporary/               )

> [zebra] temp files are not cleaned.
> -----------------------------------
>                 Key: PIG-1115
>                 URL: https://issues.apache.org/jira/browse/PIG-1115
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Hong Tang
> Temp files created by zebra during table creation are not cleaned where there is any
task failure, which results in waste of disk space.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message