hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mohammad Islam" <misla...@yahoo.com>
Subject Re: Review Request 12480: HIVE-4732 Reduce or eliminate the expensive Schema equals() check for AvroSerde
Date Mon, 15 Jul 2013 23:47:26 GMT


> On July 12, 2013, 10:44 p.m., Jakob Homan wrote:
> > Do you have after-optimization performance numbers?  Can you add a test to verify
that the reencoder cache is working correctly?  Feed in a record with one uuid, then another
with a different and verify that the cache has two elements.  Adding a third record with the
original UUID shouldn't increase the size of the cache.  Also, that adding n records all with
the same schema creates only one reencoder...

Yes we have the number after optimization. For example, each record used to take nearly 50
micro-second. After this patch, it becomes nearly 31 micro-seconds.
Added the test case as proposed. 


> On July 12, 2013, 10:44 p.m., Jakob Homan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java, line 66
> > <https://reviews.apache.org/r/12480/diff/1/?file=320688#file320688line66>
> >
> >     verifiedRecordReaders -> noReencodingNeeded ?

Done


> On July 12, 2013, 10:44 p.m., Jakob Homan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java, line 155
> > <https://reviews.apache.org/r/12480/diff/1/?file=320688#file320688line155>
> >
> >     readability: pull out getRecordReaderID into its own var

Done


> On July 12, 2013, 10:44 p.m., Jakob Homan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java,
line 78
> > <https://reviews.apache.org/r/12480/diff/1/?file=320689#file320689line78>
> >
> >     Need to write out the uuid too

Done


> On July 12, 2013, 10:44 p.m., Jakob Homan wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java,
line 92
> > <https://reviews.apache.org/r/12480/diff/1/?file=320689#file320689line92>
> >
> >     Need to read in the uuid too

Done


- Mohammad


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12480/#review23113
-----------------------------------------------------------


On July 11, 2013, 10:31 p.m., Mohammad Islam wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/12480/
> -----------------------------------------------------------
> 
> (Updated July 11, 2013, 10:31 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jakob Homan.
> 
> 
> Bugs: HIVE-4732
>     https://issues.apache.org/jira/browse/HIVE-4732
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> From our performance analysis, we found AvroSerde's schema.equals() call consumed a substantial
amount ( nearly 40%) of time. This patch intends to minimize the number schema.equals() calls
by pushing the check as late/fewer as possible.
> 
> At first, we added a unique id for each record reader which is then included in every
AvroGenericRecordWritable. Then, we introduce two new data structures (one hashset and one
hashmap) to store intermediate data to avoid duplicates checkings. Hashset contains all the
record readers' IDs that don't need any re-encoding. On the other hand, HashMap contains the
already used re-encoders. It works as cache and allows re-encoders reuse. With this change,
our test shows nearly 40% reduction in Avro record reading time.
>  
>    
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java dbc999f

>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java c85ef15 
>   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java 66f0348

>   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java 9af751b

>   serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java 2b948eb 
> 
> Diff: https://reviews.apache.org/r/12480/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Mohammad Islam
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message