hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-360) Generalize the FileFormat Interface in Hive
Date Thu, 09 Apr 2009 17:48:12 GMT

    [ https://issues.apache.org/jira/browse/HIVE-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697571#action_12697571
] 

Joydeep Sen Sarma commented on HIVE-360:
----------------------------------------

looked at this a bit - looks great to me.

One comment is that the getFinalPath call should be made part of the HiveOutputFormat as well.
Actually all we need is to let the outputformat determine the file extension. rest of the
path name is always the same. but it's not a big deal.

> Generalize the FileFormat Interface in Hive
> -------------------------------------------
>
>                 Key: HIVE-360
>                 URL: https://issues.apache.org/jira/browse/HIVE-360
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: He Yongqiang
>         Attachments: hive-360-2009-03-31.patch, hive-360-2009-04-01.patch, hive-360-2009-04-04-4.patch,
hive-360-2009-04-07-5.patch, hive-360-2009-04-08-3.patch, hive-360-2009-04-08.patch, hive-360-2009-04-09-3.patch,
hive-360-2009-04-09.patch, HIVE-360.patch, qfile.tar
>
>
> Currently the FileFormat support in Hive is not generalized - we do "if ... else" to
support TextFileFormat and SequenceFileFormat. There is no way to support a 3rd one without
changing the "if...else" structure. We should make an interface for the FileFormat need for
Hive.
> The OutputFileFormat interface that Hive requires will contain one more method than the
Hadoop OutputFileFormat - create a File with a specific name.
> Hive.g:409 (Hive.g already supports the custom file format but DDLSemanticAnalyzer.java
is not recognizing it yet
> {code}
> KW_STORED KW_AS KW_INPUTFORMAT inFmt=StringLiteral KW_OUTPUTFORMAT outFmt=StringLiteral
> {code}
> Please add the handling of TOK_TABLEFILEFORMAT here:
> DDLSemanticAnalyzer.java:223
> {code}
>         case HiveParser.TOK_TBLSEQUENCEFILE:
>         ...
> {code}
> Please add the handling of custom outputFormat here by adding a new interface (and cast
the user-provided file format to that interface), instead of doing "if ... else"
> FileSinkOperator.java:129-174:
> {code}
>       if(outputFormat instanceof IgnoreKeyTextOutputFormat) {
>         finalPath = new Path(Utilities.toTempPath(conf.getDirName()), Utilities.getTaskId(hconf)
+
>                              Utilities.getFileExtension(jc, isCompressed));
>       ...
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message