hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Bapat (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-20953) Fix testcase TestReplicationScenariosAcrossInstances#testBootstrapReplLoadRetryAfterFailureForPartitions to not depend upon the order in which objects get loaded
Date Thu, 29 Nov 2018 06:33:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-20953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ashutosh Bapat updated HIVE-20953:
----------------------------------
    Description: 
The testcase is intended to test REPL LOAD with retry. The test creates a partitioned table
and a function in the source database and loads those to the replica. The first attempt to
load a dump is intended to fail while loading one of the partitions. Based on the order in
which the objects get loaded, if the function is queued after the table, it will not be available
in replica after the load failure. But if it's queued before the table, it will be available
in replica even after the load failure. The test assumes the later case, which may not be
true always.

Hence fix the testcase to order the objects by a fixed ordering. By setting hive.in.repl.test.files.sorted
to true, the objects are ordered by the directory names. This ordering is available with
minimal changes for testing, hence we use it. With this ordering a function gets loaded before
a table. So changed the test to not expect the function to be available after the failed
load, but be available after the retry.

While writing that testcase, I found that even if a function fails to load, it's visible through
show functions and also is available to be called just as if the failure has not happened.
Digging further it was found that when creating a function we add it to the registry and also
to the metastore. If the later fails, we do not clean it up from the registry and thus it
remains visible after failure. Fixed the same.

  was:
The testcase is intended to test REPL LOAD with retry. The test creates a partitioned table
and a function in the source database and loads those to the replica. The first attempt to
load a dump is intended to fail while loading one of the partitions. Based on the order in
which the objects get loaded, if the function is queued after the table, it will not be available
in replica after the load failure. But if it's queued before the table, it will be available
in replica even after the load failure. The test assumes the later case, which may not be
true always.

Hence fix the testcase to order the objects by a fixed ordering. By setting hive.in.repl.test.files.sorted
to true, the objects are ordered by the directory names. This ordering is available with
minimal changes for testing, hence we use it. With this ordering a function gets loaded before
a table. So changed the test to not expect the function to be available after the failed
load, but be available after the retry.


> Fix testcase TestReplicationScenariosAcrossInstances#testBootstrapReplLoadRetryAfterFailureForPartitions
to not depend upon the order in which objects get loaded
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-20953
>                 URL: https://issues.apache.org/jira/browse/HIVE-20953
>             Project: Hive
>          Issue Type: Bug
>          Components: Tests
>    Affects Versions: 4.0.0
>            Reporter: Ashutosh Bapat
>            Assignee: Ashutosh Bapat
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>         Attachments: HIVE-20953.01, HIVE-20953.02, HIVE-20953.02, test_func_load_failure_retry.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The testcase is intended to test REPL LOAD with retry. The test creates a partitioned
table and a function in the source database and loads those to the replica. The first attempt
to load a dump is intended to fail while loading one of the partitions. Based on the order
in which the objects get loaded, if the function is queued after the table, it will not be
available in replica after the load failure. But if it's queued before the table, it will
be available in replica even after the load failure. The test assumes the later case, which
may not be true always.
> Hence fix the testcase to order the objects by a fixed ordering. By setting hive.in.repl.test.files.sorted
to true, the objects are ordered by the directory names. This ordering is available with
minimal changes for testing, hence we use it. With this ordering a function gets loaded before
a table. So changed the test to not expect the function to be available after the failed
load, but be available after the retry.
> While writing that testcase, I found that even if a function fails to load, it's visible
through show functions and also is available to be called just as if the failure has not happened.
Digging further it was found that when creating a function we add it to the registry and also
to the metastore. If the later fails, we do not clean it up from the registry and thus it
remains visible after failure. Fixed the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message