hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mohammad Islam" <>
Subject Re: Review Request 12480: HIVE-4732 Reduce or eliminate the expensive Schema equals() check for AvroSerde
Date Wed, 07 Aug 2013 02:13:07 GMT

This is an automatically generated e-mail. To reply, visit:

(Updated Aug. 7, 2013, 2:13 a.m.)

Review request for hive, Ashutosh Chauhan and Jakob Homan.


Add logic to avoid excessive logging for each record.

Bugs: HIVE-4732

Repository: hive-git


>From our performance analysis, we found AvroSerde's schema.equals() call consumed a substantial
amount ( nearly 40%) of time. This patch intends to minimize the number schema.equals() calls
by pushing the check as late/fewer as possible.

At first, we added a unique id for each record reader which is then included in every AvroGenericRecordWritable.
Then, we introduce two new data structures (one hashset and one hashmap) to store intermediate
data to avoid duplicates checkings. Hashset contains all the record readers' IDs that don't
need any re-encoding. On the other hand, HashMap contains the already used re-encoders. It
works as cache and allows re-encoders reuse. With this change, our test shows nearly 40% reduction
in Avro record reading time.

Diffs (updated)

  ql/src/java/org/apache/hadoop/hive/ql/io/avro/ ed2a9af 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/ e994411 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/ 66f0348

  serde/src/test/org/apache/hadoop/hive/serde2/avro/ 3828940 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/ 9af751b 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/ 2b948eb 




Mohammad Islam

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message