hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marta Kuczora <kuczo...@cloudera.com>
Subject Re: Review Request 50896: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters
Date Thu, 11 Aug 2016 07:59:54 GMT


> On Aug. 9, 2016, 5 a.m., Peter Vary wrote:
> > Hi Marta,
> > 
> > Thanks for the patch, it is nice, and clean.
> > It might be a good idea to have the inputs checked, so if the user provides a multicharacter
separator with a dsv format, then instead of using the first character of the string, an error
might be printed.
> > 
> > Otherwise looks good.
> > 
> > Thanks,
> > Peter

Hi Peter,

thanks a lot for the review.
Sure, I can change the patch to check the input and print an error if the format is dsv and
the separator contains multiple characters.

Regards,
Marta


> On Aug. 9, 2016, 5 a.m., Peter Vary wrote:
> > beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java, line 106
> > <https://reviews.apache.org/r/50896/diff/1/?file=1466246#file1466246line106>
> >
> >     nit Would not be better to change the type of the DEFAULT_DELIMITER_DSV to String?

If we change the type of the DEFAULT_DELIMITER_DSV to String, then we would need to do the
converting for the single-character delimiter case. What would be better I think is to create
a new default for dsv2, something like "String DEFAULT_DELIMITER_DSV2".


- Marta


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50896/#review145167
-----------------------------------------------------------


On Aug. 8, 2016, 3:13 p.m., Marta Kuczora wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50896/
> -----------------------------------------------------------
> 
> (Updated Aug. 8, 2016, 3:13 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Sergio Pena, Szehon Ho, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Introduced a new outputformat (dsv2) which supports multiple characters as delimiter.
> For generating the dsv, csv2 and tsv2 outputformats, the Super CSV library is used. This
library doesn’t support multiple characters as delimiter. Since the same logic is used for
generating csv2, tsv2 and dsv outputformats, I decided not to change this logic, rather introduce
a new outputformat (dsv2) which supports multiple characters as delimiter. 
> The new dsv2 outputformat has the same escaping logic as the dsv outputformat if the
quoting is not disabled.
> Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters
as delimiter.
> 
> Main changes in the code:
>  - Changed the SeparatedValuesOutputFormat class to be an abstract class and created
two new child classes to separate the logic for single-character and multi-character delimiters:
SingleCharSeparatedValuesOutputFormat and MultiCharSeparatedValuesOutputFormat
> 
>  - Kept the methods which are used by both children in the SeparatedValuesOutputFormat
and moved the methods specific to the single-character case to the SingleCharSeparatedValuesOutputFormat
class.
> 
>  - Didn’t change the logic which was in the SeparatedValuesOutputFormat, only moved
some parts to the child class.
> 
>  - Implemented the value escaping and concatenation with the delimiter string in the
MultiCharSeparatedValuesOutputFormat.
> 
> 
> Diffs
> -----
> 
>   beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 
>   beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 
>   beeline/src/java/org/apache/hive/beeline/MultiCharSeparatedValuesOutputFormat.java
PRE-CREATION 
>   beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 66d9fd0 
>   beeline/src/java/org/apache/hive/beeline/SingleCharSeparatedValuesOutputFormat.java
PRE-CREATION 
>   itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 892c733

> 
> Diff: https://reviews.apache.org/r/50896/diff/
> 
> 
> Testing
> -------
> 
> - Tested manually in BeeLine.
> - Extended the TestBeeLineWithArgs tests with new test steps which are using multiple
characters as delimiter.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message