hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Victor <>
Subject Concurrency support of Apache Hive for streaming data ingest at 7K RPS into multiple tables
Date Wed, 24 Aug 2016 07:35:15 GMT
Currently I am using Apache Hive 0.14 that ships with HDP 2.2. We are
trying perform streaming ingestion with it.
We are using the Storm Hive bolt and we have 7 tables in which we are
trying to insert. The RPS (requests per second) of our bolts ranges from
7000 to 5000 and our commit policies are configured accordingly i.e 100k
events or 15 seconds.

We see that there are many commitTxn exceptions due to serialization errors
in the metastore (we are using PostgreSQL 9.5 as metastore)
The serialization errors will cause the topology to start lagging in terms
of events processed as it will try to reprocess the batches that have

I have already backported this HIVE-10500
<> to 0.14 and there isn't
much improvement.
I went through most of the JIRA's about transaction and I found the
following HIVE-11948 <>,
HIVE-13013 <>. I would like
to backport them to 0.14.
Going through the patches gives me an impression that I need to mostly
update the queries and transaction levels.
Do these patches also require me to update the schema in the metastore?
Please also let me know if there are any other patches that I missed.

I would also like to know whether Apache Hive can handle inserts to the
same/different tables concurrently from multiple clients in 1.2.1 or later
versions without many serialization errors in Hive metastore?


View raw message