pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-652) Need to give user control of OutputFormat
Date Fri, 06 Feb 2009 18:57:59 GMT

    [ https://issues.apache.org/jira/browse/PIG-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671240#action_12671240

Alan Gates commented on PIG-652:

Sorry, forgot to include the schema part.  A second function should be added to the StoreFunc

     * Specify a schema to be used in storing the data.  This can be used by
     * store functions that store the data in a self describing format.  The
     * store function is free to ignore this if it cannot use it.
     * @param schema of the output data
    public void setStorageSchema(Schema s);

This function would be called during query planning.  The StoreFunc can then take the responsibility
of storing away the schema so that it (or it's associated OutputFormat) can access it on the
backend.  This schema will also include the sortedness of the data.

As for making the JobConf and path available those are passed to OutputFormat.getRecordWriter,
so those implementing their own OutputFormats will have access to them.  They can then pass
them on to their store functions as they wish.

For compression, pig right now has no way to communicate compression types other than file
endings (.bz is the only one we support at the moment).  This is a kludge, but I don't want
to propose a whole way to coherently communicate compression in pig at the moment.  So I vote
that we stay with this for the time being.

> Need to give user control of OutputFormat
> -----------------------------------------
>                 Key: PIG-652
>                 URL: https://issues.apache.org/jira/browse/PIG-652
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Alan Gates
>            Assignee: Alan Gates
> Pig currently allows users some control over InputFormat via the Slicer and Slice interfaces.
 It does not allow any control over OutputFormat and RecordWriter interfaces.  It just allows
the user to implement a storage function that controls how the data is serialized.  For hadoop
tables, we will need to allow custom OutputFormats that prepare output information and objects
needed by a Table store function.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message