drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "subbu srinivasan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4352) Query fails on single corrupted parquet column
Date Tue, 03 May 2016 21:23:12 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15269615#comment-15269615

subbu srinivasan commented on DRILL-4352:

This is a valid issue -Anyone working on this.

> Query fails on single corrupted parquet column
> ----------------------------------------------
>                 Key: DRILL-4352
>                 URL: https://issues.apache.org/jira/browse/DRILL-4352
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Monitoring, Storage - Parquet
>    Affects Versions: 1.4.0
>            Reporter: F M├ęthot
> Getting this error when querying a corrupted Parquet file.
> Fragment 1:9
> A single corrupt file among 1000s will cause a query to break.
> Encountering a corrupt files should be logged and not spoil a query.
> It would have been useful if it was clearly specified in the log which parquet file is
causing issue.
> Response from Ted Dunning:
> This is a lot like the problem of encountering bad lines in a line oriented file such
as CSV or JSON. 
> Drill doesn't currently have a good mechanism for skipping bad input. Or rather, it has
reasonably good mechanisms, but it doesn't use them well.
> I think that this is a very reasonable extension of the problem of dealing with individual
bad records and should be handled somehow by the parquet scanner.

This message was sent by Atlassian JIRA

View raw message