hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hong Tang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-652) Need to give user control of OutputFormat
Date Tue, 10 Feb 2009 18:52:59 GMT

    [ https://issues.apache.org/jira/browse/PIG-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672362#action_12672362
] 

Hong Tang commented on PIG-652:
-------------------------------

Again, I might miss something obvious to you. My understanding is that any OutputFormat classes
would be constructed by the default constructor, and all its states would have to be passed
from JobClient side to actual task-side through JobConf. Here is how people create a basic-table
directly through BasicTableOutputFormat:

At JobClient side:
{code}
jobConf.setOutputFormat(BasicTableOutputFormat.class);
Path outPath = new Path("path/to/the/BasicTable");
BasicTableOutputFormat.setOutputPath(jobConf, outPath);
String schema = new String("Name, Age, Salary, BonusPct");
BasicTableOutputFormat.setSchema(jobConf, schema);
{code}

{code}
static class MyReduceClass implements Reducer<K, V, BytesWritable, Tuple> {
  Tuple outRow;
  int idxName, idxAge, idxSalary, idxBonusPct;
 
  public void configure(JobConf job) {
    Schema outSchema = BasicTableOutputFormat.getSchema(job);
    outRow = TypesUtils.createTuple(outSchema);
    idxName = outSchema.getColumnIndex("Name");
    idxAge = outSchema.getColumnIndex("Age");
    idxSalary = outSchema.getColumnIndex("Salary");
    idxBonusPct = outSchema.getColumnIndex("BonusPct");
  }
  public void reduce(K key, Iterator<V> values,         
      OutputCollector<BytesWritable, Tuple> output, Reporter reporter)
      throws IOException {
    String name;
    int age, salary;
    double bonusPct;

        // ... Determine individual field values of the row to be inserted ...

    try {
      outTuple.set(idxName, name);
      outTuple.set(idxAge, new Integer(age));
      outTuple.set(idxSalary, new Integer(salary));
      outTuple.set(idxBonusPct, new Double(bonusPct));
      output.collect(new BytesWritable(name.getBytes()), outTuple);
    } catch (ExecException e) {
      // should never happen
    }
  }
 
  public void close() throws IOException { /* no-op */  } 
}
{code}

> Need to give user control of OutputFormat
> -----------------------------------------
>
>                 Key: PIG-652
>                 URL: https://issues.apache.org/jira/browse/PIG-652
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>
> Pig currently allows users some control over InputFormat via the Slicer and Slice interfaces.
 It does not allow any control over OutputFormat and RecordWriter interfaces.  It just allows
the user to implement a storage function that controls how the data is serialized.  For hadoop
tables, we will need to allow custom OutputFormats that prepare output information and objects
needed by a Table store function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message