hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marta Kuczora <kuczo...@cloudera.com>
Subject Review Request 50896: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters
Date Mon, 08 Aug 2016 15:13:28 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50896/
-----------------------------------------------------------

Review request for hive, Naveen Gangam, Sergio Pena, Szehon Ho, and Xuefu Zhang.


Repository: hive-git


Description
-------

Introduced a new outputformat (dsv2) which supports multiple characters as delimiter.
For generating the dsv, csv2 and tsv2 outputformats, the Super CSV library is used. This library
doesn’t support multiple characters as delimiter. Since the same logic is used for generating
csv2, tsv2 and dsv outputformats, I decided not to change this logic, rather introduce a new
outputformat (dsv2) which supports multiple characters as delimiter. 
The new dsv2 outputformat has the same escaping logic as the dsv outputformat if the quoting
is not disabled.
Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters
as delimiter.

Main changes in the code:
 - Changed the SeparatedValuesOutputFormat class to be an abstract class and created two new
child classes to separate the logic for single-character and multi-character delimiters: SingleCharSeparatedValuesOutputFormat
and MultiCharSeparatedValuesOutputFormat

 - Kept the methods which are used by both children in the SeparatedValuesOutputFormat and
moved the methods specific to the single-character case to the SingleCharSeparatedValuesOutputFormat
class.

 - Didn’t change the logic which was in the SeparatedValuesOutputFormat, only moved some
parts to the child class.

 - Implemented the value escaping and concatenation with the delimiter string in the MultiCharSeparatedValuesOutputFormat.


Diffs
-----

  beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 
  beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 
  beeline/src/java/org/apache/hive/beeline/MultiCharSeparatedValuesOutputFormat.java PRE-CREATION

  beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 66d9fd0 
  beeline/src/java/org/apache/hive/beeline/SingleCharSeparatedValuesOutputFormat.java PRE-CREATION

  itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 892c733


Diff: https://reviews.apache.org/r/50896/diff/


Testing
-------

- Tested manually in BeeLine.
- Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters
as delimiter.


Thanks,

Marta Kuczora


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message