spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-2700) Hidden files (such as .impala_insert_staging) should be filtered out by sqlContext.parquetFile
Date Sat, 26 Jul 2014 09:01:46 GMT

    [ https://issues.apache.org/jira/browse/SPARK-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075309#comment-14075309
] 

Sean Owen commented on SPARK-2700:
----------------------------------

(As a generic aside, yes, in general apps should never consume or read hidden "." files in
HDFS by default. The convention is the same as in Linux. It's not an Impala thing.)

> Hidden files (such as .impala_insert_staging) should be filtered out by sqlContext.parquetFile
> ----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-2700
>                 URL: https://issues.apache.org/jira/browse/SPARK-2700
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 1.0.1
>            Reporter: Teng Qiu
>
> when creating a table in impala, a hidden folder .impala_insert_staging will be created
in the folder of table.
> if we want to load such a table using Spark SQL API sqlContext.parquetFile, this hidden
folder makes trouble, spark try to get metadata from this folder, you will see the exception:
> {code:borderStyle=solid}
> Caused by: java.io.IOException: Could not read footer for file FileStatus{path=hdfs://xxx:8020/user/hive/warehouse/parquet_strings/.impala_insert_staging;
isDirectory=true; modification_time=1406333729252; access_time=0; owner=hdfs; group=hdfs;
permission=rwxr-xr-x; isSymlink=false}
> ...
> ...
> Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): Path
is not a file: /user/hive/warehouse/parquet_strings/.impala_insert_staging
> {code}
> and impala side do not think this is their problem: https://issues.cloudera.org/browse/IMPALA-837
(IMPALA-837 Delete .impala_insert_staging directory after INSERT)
> so maybe we should filter out these hidden folder/file by reading parquet tables



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message