hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Santhosh Srinivasan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-575) Please extend FieldSchema class with getSchema() member function for iterating over complex Schemas in Pig UDF outputSchema
Date Mon, 22 Dec 2008 19:04:44 GMT

    [ https://issues.apache.org/jira/browse/PIG-575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658625#action_12658625
] 

Santhosh Srinivasan commented on PIG-575:
-----------------------------------------

The FiledSchema member variable schema is public. It can be accessed directly without the
use of a getSchema() although having the method could make the code cleaner.

> Please extend FieldSchema class with getSchema() member function for iterating over complex
Schemas in Pig UDF outputSchema
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-575
>                 URL: https://issues.apache.org/jira/browse/PIG-575
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: David Ciemiewicz
>            Priority: Minor
>
> I have discovered that it is not possible to recurse through parts of the input Schema
in the UDF outputSchema function.
> I have a function that operates on an input bag of tuples and then creates sequential
pairings of the rows.
> A = foreach One generate { 
> ( 1, a ),
> ( 2, b )
> }   as  bag { tuple ( seq: int, value: chararray ) };
> The output of the PAIRS(A) should be:
> {
> ( ( 1, a ), ( 2, b ) ),
> ( ( 2, b ), ( null, null ) )
> }
> The default output schema for the function should be:
> bag { tuple ( tuple ( order: int, value: chararray ), tuple ( order: int, value: chararray
) ) ) }
> The problem I have is that I'm not able to recurse into the internal Schema of the FieldSchema
in my outputSchema function to get at the tuple within the input bag.
> Here's my sample outputSchema for PAIRS:
>     public Schema outputSchema(Schema input) {
>         try {
>         System.out.println("input: " + input.toString());
>         Schema databagSchema = new Schema();
>         Schema tupleSchema = new Schema();
>         Schema inputDataBag = new Schema(input.getFields().get(0));
>         System.out.println("inputDataBag: " + input.getFields().get(0).toString());
> //
> //  RIGHT HERE IS WHERE I WANT TO DO inputDataBag.getFields.get(0).getSchema
> //
>         Schema.FieldSchema inputTuple = inputDataBag.getFields().get(0);  // Here's where
I want to say  
>         System.out.println("inputTuple: " + inputTuple.toString());
>         databagSchema.add(new Schema.FieldSchema(null, DataType.TUPLE));
>         System.out.println("databagSchema: " + databagSchema.toString());
>         return new Schema(
>             new Schema.FieldSchema(
>                 getSchemaName( this.getClass().getName().toLowerCase(), input),
>                 databagSchema,
>                 DataType.BAG
>             )
>         );
>         } catch (Exception e) {
>                 return null;
>         }
>     }
> Here's the execution output from outputSchema:
> input: {A: {seq: int,value: chararray},int,int}
> inputDataBag: A: bag({seq: int,value: chararray})
> inputTuple: A: bag({seq: int,value: chararray})    <= what I want to see is ( seq:
int, value: chararray )
> rowSchema: A: bag({seq: int,value: chararray})
> rowSchema: A: bag({seq: int,value: chararray})

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message