drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sudheesh Katkam (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3625) Dynamic Format Detection in DFS backend for unmapped file extensions / files without extensions
Date Tue, 11 Aug 2015 16:57:45 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682076#comment-14682076
] 

Sudheesh Katkam commented on DRILL-3625:
----------------------------------------

I agree with what you mentioned in the JIRA description. I was pointing you to a starter :)

> Dynamic Format Detection in DFS backend for unmapped file extensions / files without
extensions
> -----------------------------------------------------------------------------------------------
>
>                 Key: DRILL-3625
>                 URL: https://issues.apache.org/jira/browse/DRILL-3625
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Storage - JSON, Storage - Other, Storage - Parquet, Storage - Text
& CSV
>    Affects Versions: 1.1.0
>            Reporter: Hari Sekhon
>            Assignee: Steven Phillips
>
> When querying a json file that doesn't have a ".json" extension such as ".log" I get
this exception:
> {code}0: jdbc:drill:zk=local> select * from dfs.down.`auditOut.log` limit 1;
> Aug 11, 2015 4:01:38 PM org.apache.calcite.sql.validate.SqlValidatorException <init>
> SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table 'dfs.down.auditOut.log'
not found
> Aug 11, 2015 4:01:38 PM org.apache.calcite.runtime.CalciteException <init>
> SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column 15 to
line 1, column 17: Table 'dfs.down.auditOut.log' not found
> Error: PARSE ERROR: From line 1, column 15 to line 1, column 17: Table 'dfs.down.auditOut.log'
not found
> [Error Id: 5610210b-3eb2-497f-9443-c725b29733b6 on <host>:31010] (state=,code=0)
> {code}
> However when renaming the file to have a .json extension then the query succeeds.
> Now while I could reconfigure the DFS plugin to associate all files with *.log extension
to be mapped to json, this doesn't seem like the right thing to do. I could rename the file
to have a .json extension of course which is the better thing to do but this highlights another
question, why doesn't this just work as-is?
> Hence I'd like to raise this as a feature request that when an unmapped extension or
file without any extension is encountered Drill should do a few quick checks on the file type
and then use the appropriate storage backend for the file.
> Adding this "Dynamic Format Detection" as I have dubbed it would tie in nicely with Drill's
style and existing features like the dynamic schema detection already used for json.
> This may also come in handy for dealing with outputs from MapReduce jobs where the files
may be named part-m-NNNNN or part-r-NNNNN without any extension and for example if those files
were text then the text storage backend could be immediately invoked upon them in Drill.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message