Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E5635200B74 for ; Thu, 18 Aug 2016 02:11:04 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id E3CE4160A8C; Thu, 18 Aug 2016 00:11:04 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 374D2160AB5 for ; Thu, 18 Aug 2016 02:11:04 +0200 (CEST) Received: (qmail 63816 invoked by uid 500); 18 Aug 2016 00:11:03 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 63751 invoked by uid 99); 18 Aug 2016 00:11:03 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Aug 2016 00:11:03 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 4B57D2CBF9B; Thu, 18 Aug 2016 00:11:01 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============8348379187349944526==" MIME-Version: 1.0 Subject: Re: Review Request 50896: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters From: Sergio Pena To: Sergio Pena , Xuefu Zhang , Szehon Ho , Naveen Gangam Cc: hive , Marta Kuczora Date: Thu, 18 Aug 2016 00:11:01 -0000 Message-ID: <20160818001101.17019.8402@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: Sergio Pena X-ReviewGroup: hive X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/50896/ X-Sender: Sergio Pena References: <20160817141405.17020.27097@reviews.apache.org> In-Reply-To: <20160817141405.17020.27097@reviews.apache.org> X-ReviewBoard-Diff-For: beeline/src/java/org/apache/hive/beeline/MultiCharSeparatedValuesOutputFormat.java X-ReviewBoard-Diff-For: beeline/src/java/org/apache/hive/beeline/SingleCharSeparatedValuesOutputFormat.java Reply-To: Sergio Pena X-ReviewRequest-Repository: hive-git archived-at: Thu, 18 Aug 2016 00:11:05 -0000 --===============8348379187349944526== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50896/#review146056 ----------------------------------------------------------- What about stop using the superCSV so that we can keep the 'dsv' format that can support singler and multiple characters? I don't like the use of another 'dsv2' format for multiple ones. It might be confusing for users. - Sergio Pena On Aug. 17, 2016, 2:14 p.m., Marta Kuczora wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/50896/ > ----------------------------------------------------------- > > (Updated Aug. 17, 2016, 2:14 p.m.) > > > Review request for hive, Naveen Gangam, Sergio Pena, Szehon Ho, and Xuefu Zhang. > > > Bugs: HIVE-14404 > https://issues.apache.org/jira/browse/HIVE-14404 > > > Repository: hive-git > > > Description > ------- > > Introduced a new outputformat (dsv2) which supports multiple characters as delimiter. > For generating the dsv, csv2 and tsv2 outputformats, the Super CSV library is used. This library doesn’t support multiple characters as delimiter. Since the same logic is used for generating csv2, tsv2 and dsv outputformats, I decided not to change this logic, rather introduce a new outputformat (dsv2) which supports multiple characters as delimiter. > The new dsv2 outputformat has the same escaping logic as the dsv outputformat if the quoting is not disabled. > Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter. > > Main changes in the code: > - Changed the SeparatedValuesOutputFormat class to be an abstract class and created two new child classes to separate the logic for single-character and multi-character delimiters: SingleCharSeparatedValuesOutputFormat and MultiCharSeparatedValuesOutputFormat > > - Kept the methods which are used by both children in the SeparatedValuesOutputFormat and moved the methods specific to the single-character case to the SingleCharSeparatedValuesOutputFormat class. > > - Didn’t change the logic which was in the SeparatedValuesOutputFormat, only moved some parts to the child class. > > - Implemented the value escaping and concatenation with the delimiter string in the MultiCharSeparatedValuesOutputFormat. > > > Diffs > ----- > > beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 > beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 > beeline/src/java/org/apache/hive/beeline/MultiCharSeparatedValuesOutputFormat.java PRE-CREATION > beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 66d9fd0 > beeline/src/java/org/apache/hive/beeline/SingleCharSeparatedValuesOutputFormat.java PRE-CREATION > beeline/src/main/resources/BeeLine.properties 95b8fa1 > itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 892c733 > > Diff: https://reviews.apache.org/r/50896/diff/ > > > Testing > ------- > > - Tested manually in BeeLine. > - Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter. > > > Thanks, > > Marta Kuczora > > --===============8348379187349944526==--