hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zoltan Haindrich (JIRA)" <>
Subject [jira] [Commented] (HIVE-19943) Header values keep showing up in result sets
Date Fri, 13 Jul 2018 10:20:00 GMT


Zoltan Haindrich commented on HIVE-19943:

I'm not sure how this supposed to be fixed; exploring to add these as inputformat args is
a dead end because the actual reader is some kind of "linereader" from hadoop...
I feel that this "HiveRecordReader" should somehow be pushed under the llaprecordreader somehow...but
that seems like a hard thing to do (and probably not the right move)...

[~sershe] do you have any suggestion?

To reproduce, patching an "existing test" which by mistake only tested the local
it missed this issue all along... (and run it with TestMiniLlapCliDriver)
diff --git ql/src/test/queries/clientpositive/file_with_header_footer.q ql/src/test/queries/clientpositive/file_with_header_footer.q
index 8913e54ad0..5dddcaba2a 100644
--- ql/src/test/queries/clientpositive/file_with_header_footer.q
+++ ql/src/test/queries/clientpositive/file_with_header_footer.q
@@ -11,6 +11,10 @@ CREATE EXTERNAL TABLE header_footer_table_1 (name string, message string,
id int
 SELECT * FROM header_footer_table_1;
+SELECT count(distinct name) FROM header_footer_table_1;
+SELECT assert_true(count(distinct name)=11) FROM header_footer_table_1;
 SELECT * FROM header_footer_table_1 WHERE id < 50;
 CREATE EXTERNAL TABLE header_footer_table_2 (name string, message string, id int) PARTITIONED
BY (year int, month int, day int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' tblproperties
("skip.header.line.count"="1", "skip.footer.line.count"="2");

> Header values keep showing up in result sets
> --------------------------------------------
>                 Key: HIVE-19943
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 2.1.0
>         Environment: Hdinsight Hive interactivequerry
> [Components|]
>            Reporter: Liam De Lee
>            Priority: Major
> We are using the tblproperties ("skip.header.line.count"="1") when creating an external
> When we do a select * from table we get it back as expected without the header present
in the result set.
> However when we do for instance a count(1) we get the header back in this count (tested
with a select * from table and paste it in notepad to find the amount of rows)
> If we also do this with a select distinct(column) from table we also get the header as
a distinct value.
> file structure:
> |adf|
> |hyg|
> |abc|
> *Update: 26/06/2018*
> Create statement:
> {code:java}
> -----------------------------------
> --test_type--
> -----------------------------------
>   (
>     test_type      string
>     )
> LOCATION 'adl://{adlslocation}data/data2/test'
> tblproperties ("skip.header.line.count"="1")
> {code}
>  Select statement:
> {code:java}
> select * from test_type_in;
> {code}
> Distinct statement:
> {code:java}
> select distinct test_type from test_type_in ORDER BY test_type;
> {code}
> I cannot show the exact statement because of NDA so i changed those values to test.
> I can also tell you it is not just at our HDInsight but also at another company we are
working for. It does not Mather what is in the data as well. so for testing purposes:
> {code:java}
> test_type,abcg,gjeiza,aze,grriajj,gd,rrjri,vdju{code}

This message was sent by Atlassian JIRA

View raw message