hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "He Yongqiang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1197) create a new input format where a mapper spans a file
Date Mon, 01 Mar 2010 07:15:05 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839575#action_12839575

He Yongqiang commented on HIVE-1197:

Looks very good overall, congrats!

just few minor comments:
1. Can you change inputFormatClassName to use getter and setter method?
2. some duplication code with HiveInputFormat, can we reuse them?
3. In BucketizedHiveRecordReader's next, i think should remove the check of "curReader ==
null". we should throw an exception if curReader==null, which means the reader has been closed.
4. i think we should remove line 207 in BucketizedHiveInputFormat:   newjob.setInputFormat(inputFormat.getClass());
5. In HiveRecordReader,
5.1 progress is calculated based on (number of splits done) / (total split number), can we
make it more accurate? Let's say the work is evenly divided among all splits. something like
this: (number of splits done) / (total split number) + currReader.getProgess();
5.2 getPos should return this currReader.getPos()

Another one is do you think it is a good idea to let the BucketizedHiveInputFormat extend
HiveInputFormat? That way, the code would be more clear. And we should put the RecordReader
and InputSplit in the same file as BucketizedHiveInputFormat.

> create a new input format where a mapper spans a file
> -----------------------------------------------------
>                 Key: HIVE-1197
>                 URL: https://issues.apache.org/jira/browse/HIVE-1197
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Siying Dong
>             Fix For: 0.6.0
>         Attachments: hive.1197.1.patch
> This will be needed for Sort merge joins.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message