phoenix-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <>
Subject [jira] [Commented] (PHOENIX-5258) Add support to parse header from the input CSV file as input columns for CsvBulkLoadTool
Date Fri, 03 May 2019 18:16:00 GMT


Josh Elser commented on PHOENIX-5258:

+        try(FSDataInputStream inputStream = Path(path))) {
+            String header = new BufferedReader(new InputStreamReader(inputStream)).readLine();
+            inputStream.close();
+            return header;
+        }
Closing the inputStream when you are using try-with-resources is unnecessary. Can you please
create the BufferedReader within the try-with-resources as well? e.g.
try (FSDatInputStream inputStream = Path(path));
     Reader reader = new BufferedReader(new InputStreamReader(inputStream))) {
  return header.readLine();
Some test cases which look to be missing:
 * What happens if the user provides {{--header}} but there is no header on the CSV file?
(should error)
 * What happens if the user provides both {{--header}} and {{--skip-header}}? (should error)

Looks pretty close otherwise. Good work.

> Add support to parse header from the input CSV file as input columns for CsvBulkLoadTool
> ----------------------------------------------------------------------------------------
>                 Key: PHOENIX-5258
>                 URL:
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Prashant Vithani
>            Priority: Minor
>             Fix For: 4.15.0, 5.1.0
>         Attachments: PHOENIX-5258-4.x-HBase-1.4.patch, PHOENIX-5258-master.patch
>          Time Spent: 40m
>  Remaining Estimate: 0h
> Currently, CsvBulkLoadTool does not support reading header from the input csv and expects
the content of the csv to match with the table schema. The support for the header can be added
to dynamically map the schema with the header.
> The proposed solution is to introduce another option for the tool `–header`. If this
option is passed, the input columns list is constructed by reading the first line of the input
CSV file.
>  * If there is only one file, read the header from the first line and generate the `ColumnInfo`
>  * If there are multiple files, read the header from all the files, and throw an error
if the headers across files do not match.

This message was sent by Atlassian JIRA

View raw message