hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marta Kuczora (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-18696) The partition folders might not get cleaned up properly in the HiveMetaStore.add_partitions_core method if an exception occurs
Date Tue, 20 Feb 2018 13:23:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-18696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370047#comment-16370047
] 

Marta Kuczora commented on HIVE-18696:
--------------------------------------

Sometimes also a ConcurrentModificationException occurs when running the tests which checks
if the folders are cleaned up properly. It is because there still can be running tasks which
add new entries to the addedPartitions map while iterating through the map in the finally
part.

> The partition folders might not get cleaned up properly in the HiveMetaStore.add_partitions_core
method if an exception occurs
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-18696
>                 URL: https://issues.apache.org/jira/browse/HIVE-18696
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>            Reporter: Marta Kuczora
>            Assignee: Marta Kuczora
>            Priority: Major
>
> When trying to add multiple partitions, but one of them cannot be created successfully,
none of the partitions are created, but the folders might not be cleaned up properly. See
the test case "testAddPartitionsOneInvalid" in the TestAddPartitions test.
> This is the problematic code in the HiveMetaStore.add_partitions_core method:
> {code:java}
>         for (final Partition part : parts) {
>           if (!part.getTableName().equals(tblName) || !part.getDbName().equals(dbName))
{
>             throw new MetaException("Partition does not belong to target table "
>                 + dbName + "." + tblName + ": " + part);
>           }
>           boolean shouldAdd = startAddPartition(ms, part, ifNotExists);
>           if (!shouldAdd) {
>             existingParts.add(part);
>             LOG.info("Not adding partition " + part + " as it already exists");
>             continue;
>           }
>           final UserGroupInformation ugi;
>           try {
>             ugi = UserGroupInformation.getCurrentUser();
>           } catch (IOException e) {
>             throw new RuntimeException(e);
>           }
>           partFutures.add(threadPool.submit(new Callable<Partition>() {
>             @Override
>             public Partition call() throws Exception {
>               ugi.doAs(new PrivilegedExceptionAction<Object>() {
>                 @Override
>                 public Object run() throws Exception {
>                   try {
>                     boolean madeDir = createLocationForAddedPartition(table, part);
>                     if (addedPartitions.put(new PartValEqWrapper(part), madeDir) != null)
{
>                       // Technically, for ifNotExists case, we could insert one and discard
the other
>                       // because the first one now "exists", but it seems better to report
the problem
>                       // upstream as such a command doesn't make sense.
>                       throw new MetaException("Duplicate partitions in the list: " +
part);
>                     }
>                     initializeAddedPartition(table, part, madeDir);
>                   } catch (MetaException e) {
>                     throw new IOException(e.getMessage(), e);
>                   }
>                   return null;
>                 }
>               });
>               return part;
>             }
>           }));
>         }
> {code}
> When going through the partitions, let's say for the first two partitions the threads
are successfully submitted to create the folders. But an exception occurs for the third partition
in the code before submitting the thread. (It can happen if the partition has different table
or db name as the others or it has invalid value.)
>  In this case the execution will jump to the finally part where the folders in the "addedPartitions"
map will be cleaned up. However it can happen that the threads for the first two partitions
are not finished with the folder creation yet, so the map can be empty or it can contain only
one of the partitions.
> This issue also happens in the HiveMetastore.add_partitions_pspec_core method, as this
code part is the same as in the add_partitions_core method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message