pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prashant Kommireddi (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2541) Automatic record provenance (source tagging) for PigStorage
Date Tue, 21 Feb 2012 22:58:49 GMT

    [ https://issues.apache.org/jira/browse/PIG-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213084#comment-13213084
] 

Prashant Kommireddi commented on PIG-2541:
------------------------------------------

In which case I will plan on adding source tag at the start. Auto-loading 'source_tag' when
schema is needed can be done, however we need to think about the case when there might be
a conflict between auto-loaded 'source-tag' and an already available 'source_tag' in the schema.

One way is to look for source_tag in schema and if present append an id to the auto loaded
source_tag, something like 'source_tag_001'. However, this is not ideal performance-wise,
and will cause confusion.
                
> Automatic record provenance (source tagging) for PigStorage
> -----------------------------------------------------------
>
>                 Key: PIG-2541
>                 URL: https://issues.apache.org/jira/browse/PIG-2541
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.9.1
>            Reporter: Richard Ding
>            Assignee: Prashant Kommireddi
>         Attachments: PIG-2541.patch
>
>
> There are a lot of interests in knowing where the data comes from when loading from a
directory (or a set of directories). One can do it manually (see https://cwiki.apache.org/confluence/display/PIG/FAQ).
But it will be more convenient for users if we implement this in the PigStorage with a command
line option (e.g., pig.source.tagging=true/false) to turn it on/off. By default it will be
off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message