hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5635) FileInputFormat does not specify how the file is split
Date Wed, 20 Nov 2013 23:27:35 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828281#comment-13828281
] 

Jason Lowe commented on MAPREDUCE-5635:
---------------------------------------

FileInputFormat does not require that the file is a plain text file broken into lines with
carriage-return or linefeed used as line delimiters.  That's what TextInputFormat is for.

FileInputFormat is an abstract class that makes no assumptions about how the data in the file
is formatted.  Concrete implementations that derive from FileInputFormat must implement the
getRecordReader method which will dictate how the records are read from the file and therefore
what the format must be for that particular input format.

> FileInputFormat does not specify how the file is split
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-5635
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5635
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>         Environment: Does not matter.
>            Reporter: Pranay Varma
>
> Here is what the TextInputFormat javadoc says:
> [TextInputFormat|http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/input/TextInputFormat.html]
> An InputFormat for plain text files. Files are broken into lines. Either linefeed or
carriage-return are used to signal end of line. Keys are the position in the file, and values
are the line of text..
> FileInputFormat should say the same on
> [FileInputFormat|http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.html]



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message