hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pi Song (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-159) Make changes to the parser to support new types functionality
Date Thu, 08 May 2008 12:16:55 GMT

    [ https://issues.apache.org/jira/browse/PIG-159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12595200#action_12595200

Pi Song commented on PIG-159:

I'm thinking that the load command should also support column selection.

Possibly like this:-

a = load 'myfile' as (int a:$0, float b:$1, chararray c:$3);

or if schema can be discovered from the source file:-

a = load 'myfile' as (int a:'col1', float b:'col3', chararray c:'col7');

where every bit from "as" is optional.

LOAD operator should also support column selection, this has 2 benefits:-
1) This allows us to make the execution more clever by automatically excluding unused columns.
2) This paves the way toward column-based implementation.

The column selection thing can be done after Type branch. I propose that the new load syntax
shouldn't preclude us from this.

> Make changes to the parser to support new types functionality
> -------------------------------------------------------------
>                 Key: PIG-159
>                 URL: https://issues.apache.org/jira/browse/PIG-159
>             Project: Pig
>          Issue Type: Sub-task
>          Components: impl
>            Reporter: Alan Gates
>            Assignee: Alan Gates
> In order to support the new types functionality described in http://wiki.apache.org/pig/PigTypesFunctionalSpec,
the parse needs to change in the following ways:
> 1) AS needs to support types in addition to aliases.  So where previously it was legal
to say:
> a = load 'myfile' as a, b, c;
> it will now also be legal to say
> a = load 'myfile' as a integer, b float, c chararray;
> 2) Non string constants need to be supported.  This includes non-string atomic types
(integer, long, float, double) and the non-atomic types bags, tuples, and maps.
> 3) A cast operator needs to be added so that fields can be explicitly casted.
> 4) Changes to DEFINE, to allow users to declare arguments and return types for UDFs

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message