hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pradeep Kamath (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-958) Splitting output data on key field
Date Tue, 03 Nov 2009 20:35:32 GMT

    [ https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773181#action_12773181

Pradeep Kamath commented on PIG-958:

bq. 2. Deleting the temporary directory manually in finish(), causes the job to fail. Removed
the manual deletion. As a side effect, user specified PARENT output directory in the UDF will
have empty part-* files. These should be deleted manually by the user.

Can you explain this a little more - been long since I last looked at the code - there seems
to be some mv and this deletion happening - if you can explain that part too it would be helpful

Otherwise looks good.

> Splitting output data on key field
> ----------------------------------
>                 Key: PIG-958
>                 URL: https://issues.apache.org/jira/browse/PIG-958
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Ankur
>         Attachments: 958.v3.patch, 958.v4.patch
> Pig users often face the need to split the output records into a bunch of files and directories
depending on the type of record. Pig's SPLIT operator is useful when record types are few
and known in advance. In cases where type is not directly known but is derived dynamically
from values of a key field in the output tuple, a custom store function is a better solution.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message