hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-2555) Make the hashmap in map-side group by pluggable
Date Fri, 18 Nov 2011 02:49:54 GMT

    [ https://issues.apache.org/jira/browse/HIVE-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152604#comment-13152604
] 

jiraposter@reviews.apache.org commented on HIVE-2555:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2849/
-----------------------------------------------------------

(Updated 2011-11-18 02:48:41.568039)


Review request for Yongqiang He, Ning Zhang and namit jain.


Changes
-------

In version 3 of this patch patricia-trie only worked with group by's on single string column.
Now it should work with group by's on any number of columns, with following column types:
String, int, long, byte, bool. 

This isn't yet thoroughly tested, yet it worked on some sample queries.

The code added in this patch version is unnecessarily long, unfortunately to shorten it in
any decent way would require non-minor refactoring in the already existing Hive code (basically
upon designing and implementing the whole Hive part responsible for objects/object inspector
comparing/getting values etc. it was to huge extent forgot that Java has polymorphism).


Summary
-------

Made HashTable in groupby plugable, a class that will supply hashtable functionality has to
implement ExternalMap interface. Currently I supplied 2 of them: ExternalJavaHashMap and ExternalHPPCObjectObjectOpenHashMap
(ExternalJavaMap is an abstract class that will make adding Hashmaps that implement java.util.Map
interface easier). ExternalMap has some strange methods, to allow doing various tricks that
can increase efficiency. ExternalMap could be easily made more general yet I decided it's
not worth doing that at this point (it could be if ExternalMap was also to be used by other
things than GroupByOperator).

I strongly dislike removing 10% of the hashmap in GroupByOperator.flush() since no known to
me HashMap implementation supplies efficient and nice way to do it, maybe there is a way to
do something about that flushing.  

At this point the hppc jar is added in a way to "just work", if there is a more proper way
of adding jars, then I am not aware how to do it. 


This addresses bug HIVE-2555.
    https://issues.apache.org/jira/browse/HIVE-2555


Diffs (updated)
-----

  trunk/build-common.xml 1202523 
  trunk/conf/hive-default.xml 1202523 
  trunk/lib/hppc-0.4.1.jar UNKNOWN 
  trunk/lib/patricia-trie-0.6.jar UNKNOWN 
  trunk/ql/build.xml 1202523 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 1202523 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapperFactory.java 1202523 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ExternalHPPCObjectObjectOpenHashMap.java
PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ExternalJavaHashMap.java PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ExternalJavaMap.java PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ExternalMap.java PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ExternalPatriciaTrie.java PRE-CREATION

  trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ListKeyWrapperAnalyzer.java PRE-CREATION

  trunk/ql/src/java/org/apache/hadoop/hive/ql/util/ObjectObjectExpandedOpenHashMap.java PRE-CREATION

  trunk/ql/src/java/org/apache/hadoop/hive/ql/util/PrivateInstantiator.java PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/util/TextKeyWrapperAnalyzer.java PRE-CREATION

  trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ListObjectsEqualComparer.java
1202523 

Diff: https://reviews.apache.org/r/2849/diff


Testing
-------

Worked on some sample queries and passed queries_properties.q


Thanks,

Robert


                
> Make the hashmap in map-side group by pluggable
> -----------------------------------------------
>
>                 Key: HIVE-2555
>                 URL: https://issues.apache.org/jira/browse/HIVE-2555
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Namit Jain
>            Assignee: Robert Surówka
>         Attachments: HIVE-2555.2.patch, HIVE-2555.3.patch
>
>
> There are a couple of implementations available (other than java.util.HashMap) - COLT,
TROVE etc. to name a few.
> If the hashmap was pluggable, it would be easy to play around with different hash maps
and tune performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message