Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 71FDB2009F4 for ; Thu, 26 May 2016 16:41:14 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 7071E160A2C; Thu, 26 May 2016 14:41:14 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C0ED3160A17 for ; Thu, 26 May 2016 16:41:13 +0200 (CEST) Received: (qmail 59165 invoked by uid 500); 26 May 2016 14:41:13 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 59141 invoked by uid 99); 26 May 2016 14:41:13 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 May 2016 14:41:13 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id D1BBF2C1F5C for ; Thu, 26 May 2016 14:41:12 +0000 (UTC) Date: Thu, 26 May 2016 14:41:12 +0000 (UTC) From: "Bing Li (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-13850) File name conflict when have multiple INSERT INTO queries running in parallel MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 26 May 2016 14:41:14 -0000 [ https://issues.apache.org/jira/browse/HIVE-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-13850: --------------------------- Description: We have an application which connect to HiveServer2 via JDBC. In the application, it executes "INSERT INTO" query to the same table. If there are a lot of users running the application at the same time. Some of the INSERT could fail. The root cause is that in Hive.checkPaths(), it uses the following method to check the existing of the file. But if there are multiple inserts running in parallel, it will led to the conflict. for (int counter = 1; fs.exists(itemDest) || destExists(result, itemDest); counter++) { itemDest = new Path(destf, name + ("_copy_" + counter) + filetype); } The Error Message =========================== In hive log, org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error while moving files!!! Cannot move hdfs://node:8020/apps/hive/warehouse/met adata.db/scalding_stats/.hive-staging_hive_2016-05-10_18-46- 23_642_2056172497900766879-3321/-ext-10000/000000_0 to hdfs://node:8020/apps/hive /warehouse/metadata.db/scalding_stats/000000_0_copy_9014 at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java: 2719) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java: 1645) In hadoop log, WARN hdfs.StateChange (FSDirRenameOp.java: unprotectedRenameTo(174)) - DIR* FSDirectory.unprotectedRenameTo: failed to rename /apps/hive/warehouse/metadata.db/scalding_stats/.hive- staging_hive_2016-05-10_18-46-23_642_2056172497900766879-3321/-ext- 10000/000000_0 to /apps/hive/warehouse/metadata. db/scalding_stats/000000_0_copy_9014 because destination exists was: We have an application which connect to HiveServer2 via JDBC. In the application, it executes "INSERT INTO" query to the same table. If there are a lot of users running the application at the same time. Some of the INSERT could fail. In hive log, org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error while moving files!!! Cannot move hdfs://node:8020/apps/hive/warehouse/met adata.db/scalding_stats/.hive-staging_hive_2016-05-10_18-46- 23_642_2056172497900766879-3321/-ext-10000/000000_0 to hdfs://node:8020/apps/hive /warehouse/metadata.db/scalding_stats/000000_0_copy_9014 at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java: 2719) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java: 1645) In hadoop log, WARN hdfs.StateChange (FSDirRenameOp.java: unprotectedRenameTo(174)) - DIR* FSDirectory.unprotectedRenameTo: failed to rename /apps/hive/warehouse/metadata.db/scalding_stats/.hive- staging_hive_2016-05-10_18-46-23_642_2056172497900766879-3321/-ext- 10000/000000_0 to /apps/hive/warehouse/metadata. db/scalding_stats/000000_0_copy_9014 because destination exists > File name conflict when have multiple INSERT INTO queries running in parallel > ----------------------------------------------------------------------------- > > Key: HIVE-13850 > URL: https://issues.apache.org/jira/browse/HIVE-13850 > Project: Hive > Issue Type: Bug > Affects Versions: 1.2.1 > Reporter: Bing Li > Assignee: Bing Li > > We have an application which connect to HiveServer2 via JDBC. > In the application, it executes "INSERT INTO" query to the same table. > If there are a lot of users running the application at the same time. Some of the INSERT could fail. > The root cause is that in Hive.checkPaths(), it uses the following method to check the existing of the file. But if there are multiple inserts running in parallel, it will led to the conflict. > for (int counter = 1; fs.exists(itemDest) || destExists(result, itemDest); counter++) { > itemDest = new Path(destf, name + ("_copy_" + counter) + filetype); > } > The Error Message > =========================== > In hive log, > org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error > while moving files!!! Cannot move hdfs://node:8020/apps/hive/warehouse/met > adata.db/scalding_stats/.hive-staging_hive_2016-05-10_18-46- > 23_642_2056172497900766879-3321/-ext-10000/000000_0 to hdfs://node:8020/apps/hive > /warehouse/metadata.db/scalding_stats/000000_0_copy_9014 > at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java: > 2719) > at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java: > 1645) > > In hadoop log, > WARN hdfs.StateChange (FSDirRenameOp.java: > unprotectedRenameTo(174)) - DIR* FSDirectory.unprotectedRenameTo: > failed to rename /apps/hive/warehouse/metadata.db/scalding_stats/.hive- > staging_hive_2016-05-10_18-46-23_642_2056172497900766879-3321/-ext- > 10000/000000_0 to /apps/hive/warehouse/metadata. > db/scalding_stats/000000_0_copy_9014 because destination exists -- This message was sent by Atlassian JIRA (v6.3.4#6332)