hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mitchell Gudmundson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10327) Open files in WEBHDFS which are stored in folders by Spark/Mapreduce
Date Tue, 26 Apr 2016 19:17:12 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258734#comment-15258734

Mitchell Gudmundson commented on HDFS-10327:


Unless I'm mistaken this is not a Spark specific issue. Even when running simple mapreduce
jobs you end up with a directory of part files part-r where r is the reducer number. These
directories are generally meant to be interpreted as one logical "file". In the Spark world
when writing out an RDD or Dataframe you get a part file per partition (just the same as you
would per reducer on the MR framework), however the concept is no different than on other
distributed processing engines. It seems that one would want to be able to retrieve back the
file contents of the various parts as a whole.


> Open files in WEBHDFS which are stored in folders by Spark/Mapreduce
> --------------------------------------------------------------------
>                 Key: HDFS-10327
>                 URL: https://issues.apache.org/jira/browse/HDFS-10327
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: webhdfs
>            Reporter: Thomas Hille
>              Labels: features
> When Spark saves a file in HDFS it creates a directory which includes many parts of the
file. When you read it with spark programmatically, you can read this directory as it is a
normal file.
> If you try to read this directory-style file in webhdfs, it returns 
> {"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"Path
is not a file: [...]

This message was sent by Atlassian JIRA

View raw message