hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-2555) Make the hashmap in map-side group by pluggable
Date Fri, 18 Nov 2011 17:01:16 GMT

    [ https://issues.apache.org/jira/browse/HIVE-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152964#comment-13152964
] 

jiraposter@reviews.apache.org commented on HIVE-2555:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2849/
-----------------------------------------------------------

(Updated 2011-11-18 16:59:12.336555)


Review request for Yongqiang He, Ning Zhang and namit jain.


Changes
-------

Removed a line that was incorrect, and added one that better reflects what I should have had
in mind. 


Summary (updated)
-------

Made HashTable in groupby plugable, a class that will supply hashtable functionality has to
implement ExternalMap interface. 
Currently I supplied 2 fully (hopefully) working pluggable classes: ExternalJavaHashMap and
ExternalHPPCObjectObjectOpenHashMap (ExternalJavaMap is an abstract class that will make adding
Hashmaps that implement java.util.Map interface easier). ExternalMap has some strange methods,
to allow doing various tricks that can increase efficiency. ExternalMap could be easily made
more general yet I decided it's not worth doing that at this point (it could be if ExternalMap
was also to be used by other things than GroupByOperator).

Additionally a Trie implementation was added, yet it does not currently support whole functionality
(currently only supports String, int, long and bool columns). 

I strongly dislike removing 10% of the hashmap in GroupByOperator.flush() since no known to
me HashMap implementation supplies efficient and nice way to do it, maybe there is a way to
do something about that flushing. Additionally, that method to remove 10% hasn't been tested
yet if it works with the new implementations properly.  

At this point the new libraries jars (all have Apache Commons 2.0 license) are added in a
way to "just work", if there is a more proper way of adding jars, then I am not aware how
to do it. 

Because now all keys are passed in KeyWrappers, there is a large overhead due to that. And
which "hashmap" implementation behaves best, may actually change a lot, if at one point a
primitive java types were used - now each time a comparison is needed to be made by hash map
the value needs to be extracted from the key wrapper. 

A lot of the implementation I did up to now was done quite crudely - meaning some operations
could be further optimized even without changing current Hive code. Yet this patch at this
point should sufficiently show proof of concept and that it is worth, I believe, to continue
work here. 

Currently the regular Java HashMap implementation is set in conf to be used.

Possibly the new HashMaps implementations should be moved to another package.


This addresses bug HIVE-2555.
    https://issues.apache.org/jira/browse/HIVE-2555


Diffs
-----

  trunk/build-common.xml 1202523 
  trunk/conf/hive-default.xml 1202523 
  trunk/lib/hppc-0.4.1.jar UNKNOWN 
  trunk/lib/patricia-trie-0.6.jar UNKNOWN 
  trunk/ql/build.xml 1202523 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 1202523 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapperFactory.java 1202523 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ExternalHPPCObjectObjectOpenHashMap.java
PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ExternalJavaHashMap.java PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ExternalJavaMap.java PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ExternalMap.java PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ExternalPatriciaTrie.java PRE-CREATION

  trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ListKeyWrapperAnalyzer.java PRE-CREATION

  trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ObjectObjectExpandedOpenHashMap.java PRE-CREATION

  trunk/ql/src/java/org/apache/hadoop/hive/ql/util/PrivateInstantiator.java PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/util/TextKeyWrapperAnalyzer.java PRE-CREATION

  trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ListObjectsEqualComparer.java
1202523 

Diff: https://reviews.apache.org/r/2849/diff


Testing
-------

Worked on some sample queries with each implementation added. 


Thanks,

Robert


                
> Make the hashmap in map-side group by pluggable
> -----------------------------------------------
>
>                 Key: HIVE-2555
>                 URL: https://issues.apache.org/jira/browse/HIVE-2555
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Namit Jain
>         Attachments: HIVE-2555.2.patch, HIVE-2555.3.patch, HIVE-2555.4.patch
>
>
> There are a couple of implementations available (other than java.util.HashMap) - COLT,
TROVE etc. to name a few.
> If the hashmap was pluggable, it would be easy to play around with different hash maps
and tune performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message