Return-Path: X-Original-To: apmail-drill-issues-archive@minotaur.apache.org Delivered-To: apmail-drill-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0135417F52 for ; Tue, 10 Mar 2015 23:32:43 +0000 (UTC) Received: (qmail 18648 invoked by uid 500); 10 Mar 2015 23:32:39 -0000 Delivered-To: apmail-drill-issues-archive@drill.apache.org Received: (qmail 18527 invoked by uid 500); 10 Mar 2015 23:32:39 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 18407 invoked by uid 99); 10 Mar 2015 23:32:39 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Mar 2015 23:32:39 +0000 Date: Tue, 10 Mar 2015 23:32:39 +0000 (UTC) From: "Andries Engelbrecht (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (DRILL-2424) Ignore hidden files in directory path MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Andries Engelbrecht created DRILL-2424: ------------------------------------------ Summary: Ignore hidden files in directory path Key: DRILL-2424 URL: https://issues.apache.org/jira/browse/DRILL-2424 Project: Apache Drill Issue Type: Improvement Components: Storage - JSON, Storage - Text & CSV Affects Versions: 0.7.0 Reporter: Andries Engelbrecht Assignee: Steven Phillips When streaming data to the DFS some records can be incomplete during the temporary write phase for the last file(s). These file typically have a different extension like '.tmp' or can be marked hidden with a prefix of '.' . Querying the directory path will Drill will then cause a query error as some records may not be complete in the temporary files. Having the ability to have Drill ignore hidden files and/or to only read files of designated extension in the workspace will resolve this problem. Example is using Flume to stream JSON files to a directory structure, the HDFS sink creates .tmp files (can be hidden with . prefix) that contains incomplete JSON objects till the file is closed and the .tmp extension (or prefix) is removed. Attempting to query the directory structure with Drill then results in errors due to the incomplete JSON object(s) in the tmp files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)