hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Ciemiewicz (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-575) Please extend FieldSchema class with getSchema() member function for iterating over complex Schemas in Pig UDF outputSchema
Date Mon, 22 Dec 2008 19:00:44 GMT
Please extend FieldSchema class with getSchema() member function for iterating over complex
Schemas in Pig UDF outputSchema
---------------------------------------------------------------------------------------------------------------------------

                 Key: PIG-575
                 URL: https://issues.apache.org/jira/browse/PIG-575
             Project: Pig
          Issue Type: Improvement
            Reporter: David Ciemiewicz


I have discovered that it is not possible to recurse through parts of the input Schema in
the UDF outputSchema function.

I have a function that operates on an input bag of tuples and then creates sequential pairings
of the rows.

A = foreach One generate { 
( 1, a ),
( 2, b )
}   as  bag { tuple ( seq: int, value: chararray ) };

The output of the PAIRS(A) should be:

{
( ( 1, a ), ( 2, b ) ),
( ( 2, b ), ( null, null ) )
}

The default output schema for the function should be:

bag { tuple ( tuple ( order: int, value: chararray ), tuple ( order: int, value: chararray
) ) ) }

The problem I have is that I'm not able to recurse into the internal Schema of the FieldSchema
in my outputSchema function to get at the tuple within the input bag.

Here's my sample outputSchema for PAIRS:

    public Schema outputSchema(Schema input) {
        try {
        System.out.println("input: " + input.toString());

        Schema databagSchema = new Schema();
        Schema tupleSchema = new Schema();

        Schema inputDataBag = new Schema(input.getFields().get(0));
        System.out.println("inputDataBag: " + input.getFields().get(0).toString());

//
//  RIGHT HERE IS WHERE I WANT TO DO inputDataBag.getFields.get(0).getSchema
//
        Schema.FieldSchema inputTuple = inputDataBag.getFields().get(0);  // Here's where
I want to say  
        System.out.println("inputTuple: " + inputTuple.toString());

        databagSchema.add(new Schema.FieldSchema(null, DataType.TUPLE));
        System.out.println("databagSchema: " + databagSchema.toString());

        return new Schema(
            new Schema.FieldSchema(
                getSchemaName( this.getClass().getName().toLowerCase(), input),
                databagSchema,
                DataType.BAG
            )
        );
        } catch (Exception e) {
                return null;
        }
    }

Here's the execution output from outputSchema:

input: {A: {seq: int,value: chararray},int,int}
inputDataBag: A: bag({seq: int,value: chararray})
inputTuple: A: bag({seq: int,value: chararray})    <= what I want to see is ( seq: int,
value: chararray )
rowSchema: A: bag({seq: int,value: chararray})
rowSchema: A: bag({seq: int,value: chararray})


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message