hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mich Talebzadeh" <m...@peridale.co.uk>
Subject RE: ORA-8177 with Hive transactions
Date Thu, 24 Sep 2015 20:27:44 GMT
Yes I came across this back in April trying to load 1.7 million rows from an RDBMS via SAP
replication server into Hive

 

My notes were

 

“Trying to sync a table from ASE--> RS -->  to Hadoop via DIRECT LOAD. The source
has 1.7 million rows and is populating Hive table. However, I only get around 50K rows in
Hive table before MapReduce jobs gives up and get killed.

 

I have turned on concurrency and use an Oracle database as metastore. Data I believe is delivered
in bulk through files show as rs_temp__nnn below. I thought that by turning concurrency on
in Hive, I would have resolved the problem.

 

2015-04-16 17:04:34,773 WARN  [pool-3-thread-199]: txn.TxnHandler (TxnHandler.java:detectDeadlock(928))
- Deadlock detected in unlock, trying again.

2015-04-16 17:04:34,784 INFO  [pool-3-thread-197]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(713))
- 158: source:127.0.0.1 get_table : db=asehadoop tbl=rs_temp__0x2aaab81d6ab0_t

2015-04-16 17:04:34,784 INFO  [pool-3-thread-197]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(339))
- ugi=hduser     ip=127.0.0.1    cmd=source:127.0.0.1 get_table : db=asehadoop tbl=rs_temp__0x2aaab81d6ab0_t

2015-04-16 17:04:34,785 WARN  [pool-3-thread-199]: txn.TxnHandler (TxnHandler.java:detectDeadlock(928))
- Deadlock detected in unlock, trying again.

2015-04-16 17:04:34,798 WARN  [pool-3-thread-154]: txn.TxnHandler (TxnHandler.java:detectDeadlock(928))
- Deadlock detected in unlock, trying again.

2015-04-16 17:04:34,799 WARN  [pool-3-thread-199]: txn.TxnHandler (TxnHandler.java:detectDeadlock(928))
- Deadlock detected in unlock, trying again.

2015-04-16 17:04:34,808 INFO  [pool-3-thread-198]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(713))
- 162: source:127.0.0.1 get_table : db=asehadoop tbl=rs_temp__0x2aaab804eda0_t

2015-04-16 17:04:34,809 INFO  [pool-3-thread-198]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(339))
- ugi=hduser     ip=127.0.0.1    cmd=source:127.0.0.1 get_table : db=asehadoop tbl=rs_temp__0x2aaab804eda0_t

2015-04-16 17:04:34,810 WARN  [pool-3-thread-199]: txn.TxnHandler (TxnHandler.java:detectDeadlock(928))
- Deadlock detected in unlock, trying again.

2015-04-16 17:04:34,813 WARN  [pool-3-thread-154]: txn.TxnHandler (TxnHandler.java:detectDeadlock(928))
- Deadlock detected in unlock, trying again.

2015-04-16 17:04:34,827 ERROR [pool-3-thread-199]: txn.TxnHandler (TxnHandler.java:detectDeadlock(931))
- Too many repeated deadlocks in unlock, giving up.

2015-04-16 17:04:34,835 WARN  [pool-3-thread-154]: txn.TxnHandler (TxnHandler.java:detectDeadlock(928))
- Deadlock detected in unlock, trying again.

2015-04-16 17:04:34,839 ERROR [pool-3-thread-199]: metastore.RetryingHMSHandler (RetryingHMSHandler.java:invoke(141))
- org.apache.thrift.TException: 

MetaException(message:Unable to update transaction database java.sql.SQLException: ORA-08177:
can't serialize access for this transaction

 

 

Now that ORA-08177 means that the transaction ordering of bulk data from RS . According to
the docs, ORA-08177 can be caused only by serializable transactions. It means that a row which
the serializable transaction is trying to modify was modified by another transaction after
the serializable transaction has begun. “

 

After a couple of weeks I came up with the following approach

 

“OK guys,

 

Some good news

 

Sounds like setting these two parameters helps!

 

1.     --

-- Parameter "mat_load_tran_size", Default: 10000, specifies the optimal transaction size
or batch size for the initial copying of primary data to the replicate table during direct
load materialization.

alter connection to hiveserver2.asehadoop set mat_load_tran_size to "50000"

go

 

-- Parameter "max_mat_load_threads", Default: 5, specifies the maximum number of load threads
for each table being materialized.

alter connection to hiveserver2.asehadoop set max_mat_load_threads to "1"

 

Makes things work without falling over

 

2.     Need to have concurrency enabled in Hive metastore. Mine is on Oracle. You need to
run separate sql against it one labelled like hive-txn-schema-0.14.0.oracle.sql after the
basic one hive-schema-0.14.0.oracle.sql

3.     Make sure that concurrency is enabled in Hive. hive.support.concurrency <https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.support.concurrency>
  is false by default

4.     Once concurrency in Hive is enabled, you need to install and run Apache zookeeper <https://zookeeper.apache.org/>
 for distributed lock management otherwise you are going to encounter deadlock or serialisation
issues in your metadata as below

2015-04-16 17:04:34,785 WARN  [pool-3-thread-199]: txn.TxnHandler (TxnHandler.java:detectDeadlock(928))
- Deadlock detected in unlock, trying again.

2015-04-16 17:04:34,798 WARN  [pool-3-thread-154]: txn.TxnHandler (TxnHandler.java:detectDeadlock(928))
- Deadlock detected in unlock, trying again.

2015-04-16 17:04:34,799 WARN  [pool-3-thread-199]: txn.TxnHandler (TxnHandler.java:detectDeadlock(928))
- Deadlock detected in unlock, trying again.

2015-04-16 17:04:34,808 INFO  [pool-3-thread-198]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(713))
- 162: source:127.0.0.1 get_table : db=asehadoop tbl=rs_temp__0x2aaab804eda0_t

2015-04-16 17:04:34,809 INFO  [pool-3-thread-198]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(339))
- ugi=hduser     ip=127.0.0.1    cmd=source:127.0.0.1 get_table : db=asehadoop tbl=rs_temp__0x2aaab804eda0_t

2015-04-16 17:04:34,810 WARN  [pool-3-thread-199]: txn.TxnHandler (TxnHandler.java:detectDeadlock(928))
- Deadlock detected in unlock, trying again.

2015-04-16 17:04:34,813 WARN  [pool-3-thread-154]: txn.TxnHandler (TxnHandler.java:detectDeadlock(928))
- Deadlock detected in unlock, trying again.

2015-04-16 17:04:34,827 ERROR [pool-3-thread-199]: txn.TxnHandler (TxnHandler.java:detectDeadlock(931))
- Too many repeated deadlocks in unlock, giving up.

2015-04-16 17:04:34,835 WARN  [pool-3-thread-154]: txn.TxnHandler (TxnHandler.java:detectDeadlock(928))
- Deadlock detected in unlock, trying again.

2015-04-16 17:04:34,839 ERROR [pool-3-thread-199]: metastore.RetryingHMSHandler (RetryingHMSHandler.java:invoke(141))
- org.apache.thrift.TException: 

MetaException(message:Unable to update transaction database java.sql.SQLException: ORA-08177:
can't serialize access for this transaction

 

5.     I believe the best way is to add a searchable column (PK)  to repdef and replicate
a sub-range of rows by where clause say ½ million rows at a time

 

OK just right now I managed to get ½ million rows into hive pretty quickly.

 

Source table in ASE

 

1> select count(1) from t

2> go

 

-----------

      500000

 

(1 row affected)

 

In Replicate database asehaddp in Hive

 

hive> use asehadoop;

OK

hive> select count(1) from t;

Query ID = hduser_20150429223737_1c0522a4-a7d2-4b71-873a-49e698e4c17a

Starting Job = job_1430341149959_0052, Tracking URL = http://rhes564:8088/proxy/application_1430341149959_0052/

Hadoop job information for Stage-1: number of mappers: 5; number of reducers: 1

2015-04-29 22:37:47,737 Stage-1 map = 0%,  reduce = 0%

2015-04-29 22:37:55,028 Stage-1 map = 20%,  reduce = 0%, Cumulative CPU 2.92 sec

2015-04-29 22:37:56,062 Stage-1 map = 60%,  reduce = 0%, Cumulative CPU 8.95 sec

2015-04-29 22:38:01,262 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 11.8 sec

2015-04-29 22:38:02,295 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 13.28 sec

2015-04-29 22:38:03,335 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 14.52 sec

OK

500000

 

 

I really need to go back and retry these tests

 

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7.


co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This message is for the
designated recipient only, if you are not the intended recipient, you should destroy it immediately.
Any information in this message shall not be understood as given or endorsed by Peridale Technology
Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility
of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd,
its subsidiaries nor their employees accept any responsibility.

 

From: Steve Howard [mailto:stevedhoward@gmail.com] 
Sent: 24 September 2015 14:53
To: user@hive.apache.org
Subject: Re: ORA-8177 with Hive transactions

 

All,

 

We continue to struggle with this.  We *never* get  the lock, and found one issue in which
the retry logic gets in an infinite loop.  We submitted a JIRA for that (https://issues.apache.org/jira/browse/HIVE-11934),
and patched our version (HDP 2.3, Hive 1.2.1) with a fix in which the deadlockCount variable
is no longer managed in the lock() method.  That works, but we still couldn't get the lock,
and the exception was thrown after ten retries.  At least we knew it was broken earlier ;)

 

We have made the changes to the HIVE transaction tables to enable ROWDEPENDENCIES, but are
still plagued with serialization errors that are never resolved.

 

We have only a single writer, as the Hive database environment is used as a target for an
existing EDW dataset.  The job loader is the only one making changes.  However, it is for
analytics, so we have a lot of readers.

 

We have considered changing the TRANSACTION_SERIALIZABLE for the dbConn() method call in the
TxnHandler class to READ_COMMITTED, as Oracle provides consistent reads.  Of course, the serialization
exception is thrown when one thread (a read or a writer I guess) attempts to lock a hive table
(or in our case, several hundred daily hive table partitions) attempts to update the row,
and another thread has changed and committed it in the meantime.

 

Unless I missing it, this will always be an issue since we have readers and writers and each
appears to take a lock.

 

If we know we will have a single writer, the largest risk is that the reader thinks the data
hasn't changed, when it has.  For our needs, that isn't a huge issue.

 

Are we missing something?  Any ideas?

 

Thanks,

 

Steve

 

On Fri, Sep 18, 2015 at 3:39 PM, Steve Howard <stevedhoward@gmail.com <mailto:stevedhoward@gmail.com>
> wrote:

I think ROWDEPENDENCIES on an Oracle table also covers this issue, so I don't think a separate
JIRA is needed for the INITRANS change.

 

On Fri, Sep 18, 2015 at 2:51 PM, Sergey Shelukhin <sergey@hortonworks.com <mailto:sergey@hortonworks.com>
> wrote:

There’s HIVE-11831 <https://issues.apache.org/jira/browse/HIVE-11831>  and https://issues.apache.org/jira/browse/HIVE-11833
that try to address this.

We can do a patch similar to the first one; can you file a JIRA?

 

From: Steve Howard <stevedhoward@gmail.com <mailto:stevedhoward@gmail.com> >
Reply-To: "user@hive.apache.org <mailto:user@hive.apache.org> " <user@hive.apache.org
<mailto:user@hive.apache.org> >
Date: Friday, September 18, 2015 at 10:54
To: "user@hive.apache.org <mailto:user@hive.apache.org> " <user@hive.apache.org <mailto:user@hive.apache.org>
>
Subject: ORA-8177 with Hive transactions

 

While troubleshooting an issue with transactions shortly after enabling them, I noticed the
following in an Oracle trace, which is our metastore for hive... 

 

ORA-8177: can't serialize access for this transaction

 

These were thrown on "insert into HIVE_LOCKS..."

 

Traditionally in Oracle, if an application actually needs serializable transactions, the fix
is to to set initrans and maxtrans to the number of concurrent writers.  When I ran what is
below on a table similar to HIVE_LOCKS, this exception was thrown everywhere.  The fix is
to recreate the table with higher values for initrans (only 1 is the default for initrans,
and 255 is the default for maxtrans).  When I did this and re-ran what is below, the exceptions
were no longer thrown.

 

Does anyone have any feedback on this performance hint?  The exceptions in hive are thrown
from the checkRetryable method in the TxnHandler class, but I couldn't find what class.method
throws them.  Perhaps the exceptions are not impactful, but given the fact the method expects
them as it checks for the string in the exception message, I thought I would ask for feedback
before we recreate the HIVE_LOCKS table with a higher value for INITRANS.

 

import java.sql.*;public class testLock implements Runnable {
  public static void main (String[] args) throws Exception {
    Class.forName("oracle.jdbc.driver.OracleDriver");
    for (int i = 1; i <= 100; i++) {
      testLock tl = new testLock();
    }
  }
 
  public testLock() {
    Thread t = new Thread(this);
    t.start();
  }
 
  public void run() {
    try {
      Connection conn = DriverManager.getConnection("jdbc:oracle:thin:username/pwd@dbhost:1521/dbservice");
      conn.createStatement().execute("alter session set isolation_level = serializable");
      PreparedStatement pst = conn.prepareStatement("update test set a = ?");
      for (int j = 1; j <= 10000; j++) {
        pst.setInt(1,j);
        pst.execute();
        conn.commit();
        System.out.println("worked");
      }
    }
    catch (Exception e) {
      System.out.println(e.getMessage());
    }
  }}

 

 


Mime
View raw message