hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Camille Marvin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-6918) ALTER TABLE using embedded metastore fails with duplicate key violation in 'dbo.SERDES'
Date Wed, 16 Apr 2014 00:38:15 GMT

     [ https://issues.apache.org/jira/browse/HIVE-6918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Camille Marvin updated HIVE-6918:
---------------------------------

    Description: 
An HDINSIGHT customer is doing some heavy metadata operations using an embedded metastore.
They get an error with a duplicate key in a metastore table 'dbo.SERDES'. They have multiple
concurrent jobs doing ALTER TABLE concurrently (on different tables) using the same metastore
database, but with each job having an embedded metastore because they set hive.metastore.uris
to the empty string.

The script looks like:

set hive.metastore.uris=;
...
CREATE EXTERNAL TABLE IF NOT EXISTS InputData_828c53de_ad24_928e_3db3_948cf821a3e0 (
...
)
PARTITIONED BY (tenant string, d string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
;
ALTER TABLE InputData_828c53de_ad24_928e_3db3_948cf821a3e0 ...;
... (several more like this);
ALTER TABLE InputData_828c53de_ad24_928e_3db3_948cf821a3e0 ADD IF NOT EXISTS PARTITION (tenant='8dddaf7c-2354-47ae-87a7-b781f14f8c11',
d='20140414') LOCATION 'wasb://inputblob@storage27415020383770839.blob.core.windows.net/v0/tenant=8dddaf7c-2354-47ae-87a7-b781f14f8c11/d=20140414/';
... several more like the above (14 ALTER TABLE statements in a row)
...

Then they get this error:

...
	at java.lang.reflect.Method.invoke(Method.java:601)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
NestedThrowablesStackTrace:
java.sql.BatchUpdateException: Violation of PRIMARY KEY constraint 'PK_serdes_SERDE_ID'. Cannot
insert duplicate key in object 'dbo.SERDES'. The duplicate key value is (209703).
	at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:1160)
	at com.jolbox.bonecp.StatementHandle.executeBatch(StatementHandle.java:469)
	at org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:583)
	at org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:291)
	at org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:267)
	at org.datanucleus.store.rdbms.scostore.RDBMSJoinMapStore.getValue(RDBMSJoinMapStore.java:656)
	at org.datanucleus.store.rdbms.scostore.RDBMSJoinMapStore.putAll(RDBMSJoinMapStore.java:195)
	at org.datanucleus.store.mapped.mapping.MapMapping.postInsert(MapMapping.java:135)
	at org.datanucleus.store.rdbms.request.InsertRequest.execute(InsertRequest.java:517)
...

  was:
An HDINSIGHT customer is doing some heavy metadata operations using an embedded metastore.
They get an error with a duplicate key in a metastore table 'dbo.SERDES'. They have multiple
concurrent jobs doing ALTER TABLE concurrently (on different tables, I believe) using the
same metastore database, but with each job having an embedded metastore because they set hive.metastore.uris
to the empty string.

The script looks like:

set hive.metastore.uris=;
...
CREATE EXTERNAL TABLE IF NOT EXISTS InstanceSpaceData_828c53de_ad24_928e_3db3_948cf821a3e0
(
...
)
PARTITIONED BY (tenant string, d string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
;
ALTER TABLE InstanceSpaceData_828c53de_ad24_928e_3db3_948cf821a3e0 ...;
... (several more like this);
ALTER TABLE InstanceSpaceData_828c53de_ad24_928e_3db3_948cf821a3e0 ADD IF NOT EXISTS PARTITION
(tenant='8dddaf7c-2354-47ae-87a7-b781f14f8c11', d='20140414') LOCATION 'wasb://instancespaceblob@advisor27415020383770839.blob.core.windows.net/v0/tenant=8dddaf7c-2354-47ae-87a7-b781f14f8c11/d=20140414/';
... several more like the above (14 ALTER TABLE statements in a row)
...

Then they get this error:

...
	at java.lang.reflect.Method.invoke(Method.java:601)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
NestedThrowablesStackTrace:
java.sql.BatchUpdateException: Violation of PRIMARY KEY constraint 'PK_serdes_SERDE_ID'. Cannot
insert duplicate key in object 'dbo.SERDES'. The duplicate key value is (209703).
	at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:1160)
	at com.jolbox.bonecp.StatementHandle.executeBatch(StatementHandle.java:469)
	at org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:583)
	at org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:291)
	at org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:267)
	at org.datanucleus.store.rdbms.scostore.RDBMSJoinMapStore.getValue(RDBMSJoinMapStore.java:656)
	at org.datanucleus.store.rdbms.scostore.RDBMSJoinMapStore.putAll(RDBMSJoinMapStore.java:195)
	at org.datanucleus.store.mapped.mapping.MapMapping.postInsert(MapMapping.java:135)
	at org.datanucleus.store.rdbms.request.InsertRequest.execute(InsertRequest.java:517)
...



removing product-specific attributes from script, clarifying uncertainty (yes, it is separate
tables)

> ALTER TABLE using embedded metastore fails with duplicate key violation in 'dbo.SERDES'
> ---------------------------------------------------------------------------------------
>
>                 Key: HIVE-6918
>                 URL: https://issues.apache.org/jira/browse/HIVE-6918
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 0.11.0
>         Environment: hive-0.11.0.1.3.7.0-01272
> HDInsight version: 2.1.4.0.661685
>            Reporter: Eric Hanson
>
> An HDINSIGHT customer is doing some heavy metadata operations using an embedded metastore.
They get an error with a duplicate key in a metastore table 'dbo.SERDES'. They have multiple
concurrent jobs doing ALTER TABLE concurrently (on different tables) using the same metastore
database, but with each job having an embedded metastore because they set hive.metastore.uris
to the empty string.
> The script looks like:
> set hive.metastore.uris=;
> ...
> CREATE EXTERNAL TABLE IF NOT EXISTS InputData_828c53de_ad24_928e_3db3_948cf821a3e0 (
> ...
> )
> PARTITIONED BY (tenant string, d string)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> ;
> ALTER TABLE InputData_828c53de_ad24_928e_3db3_948cf821a3e0 ...;
> ... (several more like this);
> ALTER TABLE InputData_828c53de_ad24_928e_3db3_948cf821a3e0 ADD IF NOT EXISTS PARTITION
(tenant='8dddaf7c-2354-47ae-87a7-b781f14f8c11', d='20140414') LOCATION 'wasb://inputblob@storage27415020383770839.blob.core.windows.net/v0/tenant=8dddaf7c-2354-47ae-87a7-b781f14f8c11/d=20140414/';
> ... several more like the above (14 ALTER TABLE statements in a row)
> ...
> Then they get this error:
> ...
> 	at java.lang.reflect.Method.invoke(Method.java:601)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> NestedThrowablesStackTrace:
> java.sql.BatchUpdateException: Violation of PRIMARY KEY constraint 'PK_serdes_SERDE_ID'.
Cannot insert duplicate key in object 'dbo.SERDES'. The duplicate key value is (209703).
> 	at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:1160)
> 	at com.jolbox.bonecp.StatementHandle.executeBatch(StatementHandle.java:469)
> 	at org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:583)
> 	at org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:291)
> 	at org.datanucleus.store.rdbms.SQLController.getStatementForQuery(SQLController.java:267)
> 	at org.datanucleus.store.rdbms.scostore.RDBMSJoinMapStore.getValue(RDBMSJoinMapStore.java:656)
> 	at org.datanucleus.store.rdbms.scostore.RDBMSJoinMapStore.putAll(RDBMSJoinMapStore.java:195)
> 	at org.datanucleus.store.mapped.mapping.MapMapping.postInsert(MapMapping.java:135)
> 	at org.datanucleus.store.rdbms.request.InsertRequest.execute(InsertRequest.java:517)
> ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message