hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4065) support for reading binary data from flat files
Date Fri, 19 Sep 2008 04:08:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632501#action_12632501
] 

Owen O'Malley commented on HADOOP-4065:
---------------------------------------

I'm still not convinced about the utility of this class outside of Hive. What is the advantage
of storing the data this way?
If you put it in a sequence file or t-file, a single bug in the serialization code for the
application type doesn't destroy
your entire file. With this format, that is exactly what will happen. Furthermore, since the
types have to be configured,
you can't use multiple ones in different contexts.

Maybe we should just put this into Hive?

> support for reading binary data from flat files
> -----------------------------------------------
>
>                 Key: HADOOP-4065
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4065
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Joydeep Sen Sarma
>         Attachments: FlatFileReader.java, HADOOP-4065.0.txt, HADOOP-4065.1.txt, HADOOP-4065.1.txt,
ThriftFlatFile.java
>
>
> like textinputformat - looking for a concrete implementation to read binary records from
a flat file (that may be compressed).
> it's assumed that hadoop can't split such a file. so the inputformat can set splittable
to false.
> tricky aspects are:
> - how to know what class the file contains (has to be in a configuration somewhere).
> - how to determine EOF (would be nice if hadoop can determine EOF and not have the deserializer
throw an exception  (which is hard to distinguish from a exception due to corruptions?)).
this is easy for non-compressed streams - for compressed streams - DecompressorStream has
a useful looking getAvailable() call - except the class is marked package private.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message