hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiaobing Zhou (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-8401) Orc file merge operator only close last orc file it opened, which resulted in redundant data in table directory
Date Wed, 08 Oct 2014 18:17:34 GMT

     [ https://issues.apache.org/jira/browse/HIVE-8401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xiaobing Zhou updated HIVE-8401:
--------------------------------
    Description: 
run the test
{noformat}
mvn -Phadoop-2  test -Dtest=TestCliDriver -Dqfile=alter_merge_2_orc.q
{noformat}
to reproduce it. Simply, this query does three data loads which generates three orc files,
ALTER TABLE CONCATENATE tries to merge orc pieces into a single one which is final file to
queried.

Output \hive\itests\qtest\target\qfile-results\clientpositive\alter_merge_2_orc.q.out shows
# records as 600 that is wrong as opposed to 610 expected.

Because OrcFileMergeOperator only closes last orc file, the 1st and 2nd orc files still remain
in table directory due to failure of deleting unclosed file for old data clean when MoveTask
tries to copy merged orc file from scratch dir to table dir. Eventually the query goes to
old data(1st and 2nd orc files).


        Summary: Orc file merge operator only close last orc file it opened, which resulted
in redundant data in table directory  (was: Orc file merge operator didn't close files it
opened, which resulted in redundant data in table directory)

> Orc file merge operator only close last orc file it opened, which resulted in redundant
data in table directory
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-8401
>                 URL: https://issues.apache.org/jira/browse/HIVE-8401
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.14.0
>         Environment: Windows Server
>            Reporter: Xiaobing Zhou
>            Assignee: Xiaobing Zhou
>            Priority: Critical
>
> run the test
> {noformat}
> mvn -Phadoop-2  test -Dtest=TestCliDriver -Dqfile=alter_merge_2_orc.q
> {noformat}
> to reproduce it. Simply, this query does three data loads which generates three orc files,
ALTER TABLE CONCATENATE tries to merge orc pieces into a single one which is final file to
queried.
> Output \hive\itests\qtest\target\qfile-results\clientpositive\alter_merge_2_orc.q.out
shows # records as 600 that is wrong as opposed to 610 expected.
> Because OrcFileMergeOperator only closes last orc file, the 1st and 2nd orc files still
remain in table directory due to failure of deleting unclosed file for old data clean when
MoveTask tries to copy merged orc file from scratch dir to table dir. Eventually the query
goes to old data(1st and 2nd orc files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message