hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gaurav Jain (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs
Date Wed, 10 Feb 2010 23:28:30 GMT

    [ https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832287#action_12832287

Gaurav Jain commented on PIG-1140:

Few suggestions to the implementation

 -- In initialize method(), we sld do 
   Configuration conf = new Configuration(false) which creates an empty object. 
   Configuration conf = new Configuration() populates the object from default-*xml which may
contain conflicting properties. 
    ( Good to have ) 
 -- In seekNear method(), we might want to check the nullness of tableRecordReader. ( Good
to have ) 
 -- In createIndexReader(), since we set the projection, we sld not send null projection to

     createTableRecordReader(job, null). 
     It sld be createTableRecordReader(job, TableInoutFormat.getProjection(job)) (need to
 -- In setLocation() and getSchema(), if we are handling paths == null then we might want
to check paths.isEmpty() as well. (good to have) 
 -- Instead of implementing new classes (TableOutputFormat and TableOutputCommitter), we sld
use BasicTableOutputFormat and BasicTableOutputFormat.TableOutputCommitter in zebra mapreduce
package ( must have ) 
                                   (There would be a separate jira/patch to do the same )

 -- Code from storeSchema sld go TableOutputFormat.TableOutputCommitter.cleanupJob(). 
 -- Does pig calls OutputCommitter.abortJob() for failed jobs ? 

> [zebra] Use of Hadoop 2.0 APIs  
> --------------------------------
>                 Key: PIG-1140
>                 URL: https://issues.apache.org/jira/browse/PIG-1140
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.6.0
>            Reporter: Yan Zhou
>            Assignee: Xuefu Zhang
>             Fix For: 0.7.0
>         Attachments: zebra.0209
> Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to upgrade to
its 2.0 APIs.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message