phoenix-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-5258) Add support to parse header from the input CSV file as input columns for CsvBulkLoadTool
Date Mon, 06 May 2019 21:55:00 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834224#comment-16834224
] 

Hadoop QA commented on PHOENIX-5258:
------------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12967962/PHOENIX-5258-4.x-HBase-1.4.patch
  against 4.x-HBase-1.4 branch at commit bb1327ef89fb0844094470ada74cbe5071b43a0d.
  ATTACHMENT ID: 12967962

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include any new or modified
tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:red}-1 release audit{color}.  The applied patch generated 6 release audit warnings
(more than the master's current 0 warnings).

    {color:red}-1 lineLengths{color}.  The patch introduces the following lines longer than
100:
    +            stmt.execute("CREATE TABLE S.TABLE14 (ID INTEGER NOT NULL PRIMARY KEY, NAME
VARCHAR, TYPE VARCHAR, CATEGORY VARCHAR)");
+                        "Headers in provided input files are different. Headers must be unique
for all input files"
+    static final Option SKIP_HEADER_OPT = new Option("k", "skip-header", false, "Skip the
first line of CSV files (the header)");
+    static final Option HEADER_OPT = new Option("p", "parse-header", false, "Parses the first
line of CSV as the header");
+    private List<String> parseCsvHeaders(CommandLine cmdLine, Configuration conf) throws
IOException {
+                "Headers in provided input files are different. Headers must be unique for
all input files"
+    private List<String> fetchAllHeaders(Iterable<String> paths, Configuration
conf) throws IOException {

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
     ./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.UpgradeIT
./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.index.MutableIndexFailureIT
./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.IndexRebuildTaskIT
./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.join.HashJoinMoreIT

Test results: https://builds.apache.org/job/PreCommit-PHOENIX-Build/2559//testReport/
Release audit warnings: https://builds.apache.org/job/PreCommit-PHOENIX-Build/2559//artifact/patchprocess/patchReleaseAuditWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-PHOENIX-Build/2559//console

This message is automatically generated.

> Add support to parse header from the input CSV file as input columns for CsvBulkLoadTool
> ----------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-5258
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5258
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Prashant Vithani
>            Priority: Minor
>             Fix For: 4.15.0, 5.1.0
>
>         Attachments: PHOENIX-5258-4.x-HBase-1.4.patch, PHOENIX-5258-master.patch
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, CsvBulkLoadTool does not support reading header from the input csv and expects
the content of the csv to match with the table schema. The support for the header can be added
to dynamically map the schema with the header.
> The proposed solution is to introduce another option for the tool `–header`. If this
option is passed, the input columns list is constructed by reading the first line of the input
CSV file.
>  * If there is only one file, read the header from the first line and generate the `ColumnInfo`
list.
>  * If there are multiple files, read the header from all the files, and throw an error
if the headers across files do not match.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message