drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mehant Baid (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (DRILL-2424) Ignore hidden files in directory path
Date Tue, 22 Sep 2015 14:47:04 GMT

     [ https://issues.apache.org/jira/browse/DRILL-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mehant Baid resolved DRILL-2424.
--------------------------------
    Resolution: Duplicate
      Assignee: Mehant Baid  (was: Steven Phillips)

> Ignore hidden files in directory path
> -------------------------------------
>
>                 Key: DRILL-2424
>                 URL: https://issues.apache.org/jira/browse/DRILL-2424
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - JSON, Storage - Text & CSV
>    Affects Versions: 0.7.0
>            Reporter: Andries Engelbrecht
>            Assignee: Mehant Baid
>             Fix For: 1.2.0
>
>
> When streaming data to the DFS some records can be incomplete during the temporary write
phase for the last file(s). These file typically have a different extension like '.tmp' or
can be marked hidden with a prefix of '.'  .
> Querying the directory path will Drill will then cause a query error as some records
may not be complete in the temporary files. Having the ability to have Drill ignore hidden
files and/or to only read files of designated extension in the workspace will resolve this
problem.
> Example is using Flume to stream JSON files to a directory structure, the HDFS sink creates
.tmp files (can be hidden with . prefix) that contains incomplete JSON objects till the file
is closed and the .tmp extension (or prefix) is removed. Attempting to query the directory
structure with Drill then results in errors due to the incomplete JSON object(s) in the tmp
files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message