Return-Path: X-Original-To: apmail-drill-issues-archive@minotaur.apache.org Delivered-To: apmail-drill-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F235D187C5 for ; Tue, 27 Oct 2015 15:43:27 +0000 (UTC) Received: (qmail 35784 invoked by uid 500); 27 Oct 2015 15:43:27 -0000 Delivered-To: apmail-drill-issues-archive@drill.apache.org Received: (qmail 35747 invoked by uid 500); 27 Oct 2015 15:43:27 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 35534 invoked by uid 99); 27 Oct 2015 15:43:27 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Oct 2015 15:43:27 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id A76742C14F2 for ; Tue, 27 Oct 2015 15:43:27 +0000 (UTC) Date: Tue, 27 Oct 2015 15:43:27 +0000 (UTC) From: "Jacques Nadeau (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (DRILL-3572) Provide a simple interface to append metadata to files and directories (.drill) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/DRILL-3572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14976576#comment-14976576 ] Jacques Nadeau edited comment on DRILL-3572 at 10/27/15 3:42 PM: ----------------------------------------------------------------- For some formats (for example text), it would be important to be able to potentially provide field types and field names. This seems to be a property of the format rather than a general concept that must be available for all formats. For example {code} { // Drill version identifier version: "dd1" // Format Plugin Configuration format: { type: "text", fieldDelimiter: ",", lineDelimiter: "\n", recordStructure: [ {name: "name", type: "VARCHAR(255)"}, {name: "email", type: "VARCHAR(255)"}, {name: "age", type: "INT"}, {name: "registrationDate", type: "TIMESTAMP"}, ] }, // Option that describes what we should do if we fail to read a record // options include: warn, fail, record recordFailureMode: "warn", // Option that describes what to do if we fail to parse a file // options include: warn, fail, record fileFailureMode: "warn" } {code} was (Author: jnadeau): For some formats (for example text), it would be important to be able to potentially provide field types and field names. This seems to be a property of the format rather than a general concept that must be available for all formats. For example {code} { // Drill version identifier version: "dd1" // Format Plugin Configuration format: { type: "text", fieldDelimiter: ",", lineDelimiter: "\n", recordStructure: [ {name: "name", type: "VARCHAR(255)"}, {name: "email", type: "VARCHAR(255)"}, {name: "age", type: "INT"}, {name: "registrationDate", type: "TIMESTAMP"}, ] } } {code} > Provide a simple interface to append metadata to files and directories (.drill) > ------------------------------------------------------------------------------- > > Key: DRILL-3572 > URL: https://issues.apache.org/jira/browse/DRILL-3572 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Other > Reporter: Jacques Nadeau > Fix For: Future > > > We need a way to store small amounts of metadata about a file or a collection of files. The current thinking was a way to have a "dot drill file" that ascribes metadata to a particular asset. > Initial example file might be something that includes the following: > {code} > { > // Drill version identifier > version: "dd1" > > // Format Plugin Configuration > format: { > type: "httpd", > format: "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \"%{Cookie}i\""} > }, > > // Traits of underlying data (a.k.a physical properties) > traits: [ // traits of the underlying data > {type: "sort_nulls_first", columns: ["request.uri", "client.host"]} > {type: "unique", columns ["abc"]} > {type: "unqiue", columns ["xy", "zz"]} > ], > > // Mappings between directory names and exposed columns > dirs: [ > {skip: true}, // don't include this directory name in the directory path. > {name: "year", type: "integer"}, > {name: "month", type: "integer"}, > {name: "day", type: "integer"} > ], > // whether or not a user can add new columns to the table through insert > rigid_table: true > > } > {code} > We also need to support adding more machine-generated/managed data such as statistics. That should be done using a separate file from the one that is human description. > A user should be able to ascribe this metadata directly through the file system as well as through sql commands such as > {code} > ALTER TABLE ADD METADATA ... > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)