hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10327) Open files in WEBHDFS which are stored in folders by Spark/Mapreduce
Date Tue, 26 Apr 2016 18:01:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258567#comment-15258567
] 

Chris Nauroth commented on HDFS-10327:
--------------------------------------

It looks like in that example, myfile.csv is a directory, and its contents are 3 files: _SUCCESS,
part-00000 and part-00001.  Attempting to open myfile.csv directly as a file definitely won't
work.  If Spark has a feature that lets you "open" it directly, then perhaps this is implemented
at the application layer by Spark?  Maybe it does something equivalent to {{hdfs dfs -cat
myfile.csv/part*}}?

That last example demonstrates the separation of concerns I'm talking about: the Hadoop shell
command performs glob expansion to identify all files matching a pattern, and then it opens
and displays each file separately, using HDFS APIs that operate on individual file paths.

> Open files in WEBHDFS which are stored in folders by Spark/Mapreduce
> --------------------------------------------------------------------
>
>                 Key: HDFS-10327
>                 URL: https://issues.apache.org/jira/browse/HDFS-10327
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: webhdfs
>            Reporter: Thomas Hille
>              Labels: features
>
> When Spark saves a file in HDFS it creates a directory which includes many parts of the
file. When you read it with spark programmatically, you can read this directory as it is a
normal file.
> If you try to read this directory-style file in webhdfs, it returns 
> {"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"Path
is not a file: [...]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message