drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Phillips (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-1906) Parquet reader error when reading a subdirectory
Date Tue, 24 Feb 2015 07:42:11 GMT

     [ https://issues.apache.org/jira/browse/DRILL-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Steven Phillips updated DRILL-1906:
-----------------------------------
    Fix Version/s: 0.9.0

> Parquet reader error when reading a subdirectory
> ------------------------------------------------
>
>                 Key: DRILL-1906
>                 URL: https://issues.apache.org/jira/browse/DRILL-1906
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>            Reporter: Aman Sinha
>            Assignee: Steven Phillips
>             Fix For: 0.9.0
>
>
> I am not sure if this is a regression but on current master branch, Drill is unable to
read subdirectories if there are parquet files in the parent directory and subdirectory. 
It's trying to read the footer for the subdirectory itself instead of recursing below.   JSON
works fine.  
> For example, here's my directory structure: 
> {code}
>  ls -lR /tmp/foo1
> -rw-r--r--  1 asinha  wheel  132 Dec 20 11:10 0_0_0.parquet
> drwxr-xr-x  3 asinha  wheel  102 Dec 20 09:54 foo2
> /tmp/foo1/foo2:
> -rw-r--r--  1 asinha  wheel  132 Dec 16 16:14 0_0_0.parquet
> {code}
> Here's the failure and stack trace: 
> {code}
> 0: jdbc:drill:zk=local> select * from foo1;
> Query failed: Query failed: Unexpected exception during fragment initialization: Internal
error: Error while applying rule DrillTableRule, args [rel#660:EnumerableTableAccessRel.ENUMERABLE.ANY([]).[](table=[dfs,
tmp, foo1])]
> <skip>
> Caused by: java.io.IOException: Could not read footer: java.io.IOException: Could not
read footer for file DeprecatedRawLocalFileStatus{path=file:/tmp/foo1/foo2; isDirectory=true;
modifica
> tion_time=1419098040000; access_time=0; owner=; group=; permission=rwxrwxrwx; isSymlink=false}
>         at parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:195)
~[parquet-hadoop-1.5.1-drill-r4.jar:0.8.0-SNAPSHOT]
>         at parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:208)
~[parquet-hadoop-1.5.1-drill-r4.jar:0.8.0-SNAPSHOT]
>         at parquet.hadoop.ParquetFileReader.readFooters(ParquetFileReader.java:224) ~[parquet-hadoop-1.5.1-drill-r4.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.store.parquet.ParquetGroupScan.readFooter(ParquetGroupScan.java:208)
~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message