Return-Path: X-Original-To: apmail-drill-issues-archive@minotaur.apache.org Delivered-To: apmail-drill-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5AB6E17856 for ; Sun, 20 Sep 2015 00:21:04 +0000 (UTC) Received: (qmail 1979 invoked by uid 500); 20 Sep 2015 00:21:04 -0000 Delivered-To: apmail-drill-issues-archive@drill.apache.org Received: (qmail 1947 invoked by uid 500); 20 Sep 2015 00:21:04 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 1928 invoked by uid 99); 20 Sep 2015 00:21:04 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 20 Sep 2015 00:21:04 +0000 Date: Sun, 20 Sep 2015 00:21:04 +0000 (UTC) From: "Jacques Nadeau (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (DRILL-3808) Let TextReader have the option to treat double quote as a literal MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/DRILL-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14877351#comment-14877351 ] Jacques Nadeau commented on DRILL-3808: --------------------------------------- Why would that affect anything else? The quote is configurable at the format plugin level. Note here: https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/TextFormatPlugin.java#L134 Configuration options existing for field and line delimiter as well as for quote, escape, comment and skipFirstLine > Let TextReader have the option to treat double quote as a literal > ----------------------------------------------------------------- > > Key: DRILL-3808 > URL: https://issues.apache.org/jira/browse/DRILL-3808 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Text & CSV > Reporter: Sean Hsuan-Yi Chu > Assignee: Sean Hsuan-Yi Chu > Priority: Critical > > According to references [1], [2]: > In .csv, the double quote is a special character as it can optionally enclose a text field. But in .tsv, it is not a special character, and it can appear anywhere and when it does, it should treated as a literal. The tsv format specification also does not provide for the tab or CR/LF characters to show up anywhere in text fields. However, Drill treats tsv very the same like csv. > For an example, given data: > {code} > "test"\t"test" > {code} > A query: select columns[0], columns[1] from `t.tsv`; Drill would give > {code} > test test > {code} > However, according to the reference[2], it is supposed to be > {code} > "test" "test" > {code} > Ideally, the Drill should follow the standard see[2]. > [1] CSV - https://tools.ietf.org/html/rfc4180 > [2] TSV - http://www.iana.org/assignments/media-types/text/tab-separated-values -- This message was sent by Atlassian JIRA (v6.3.4#6332)