hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gunther Hagleitner" <ghagleit...@hortonworks.com>
Subject Re: Review Request 18230: HIVE-6429 MapJoinKey has large memory overhead in typical cases
Date Fri, 21 Feb 2014 19:20:37 GMT


> On Feb. 21, 2014, 4:44 a.m., Gunther Hagleitner wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinKeyBytes.java, line
64
> > <https://reviews.apache.org/r/18230/diff/3/?file=499109#file499109line64>
> >
> >     this and the next field (vectorized) is going to be neigh impossible to maintain.
you really care about fixed length here is that it? we should add this to objectinstpectorutils
or something like that. i think there's already some code in hive that returns size of datatype
to you.
> 
> Sergey Shelukhin wrote:
>     These sizes are actually the ones DataOutput implementation below writes. The thing
I care about is that serialized keys are comparable.
>     Why would they be impossible to maintain?

A different thought: Have you considered using HiveKey/BinarySortableSerDe for this? Would
this create more overhead? I think that SerDe uses vInts for the datatypes you support and
the result is binary comparable. There might be some fixed overhead you don't want - but if
we could reuse some of that code there wouldn't be a problem of maintaining this specific
key stuff.

The sizes you have in the map look like the java datatype sizes, that's why I was suggesting
using the Utils. Either way - if you could move that logic (partly) to the proper utils there's
a better chance that someone adding/changing datatypes will catch it.


> On Feb. 21, 2014, 4:44 a.m., Gunther Hagleitner wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java, line 177
> > <https://reviews.apache.org/r/18230/diff/3/?file=499105#file499105line177>
> >
> >     this seems ugly to me. can't you delegate to the hashtable loader to decide
which class to use?
> 
> Sergey Shelukhin wrote:
>     this is in the operator, hashtable is already loaded so there's no loader, and key
is already decided.
>     We want to ensure that 
>     1) We use the same key as that table.
>     2) We don't make decision for each key separately rechecking again and again... but
to decide based on previous key we need previous key.

I don't understand the arguments. The hashtable loader is part of the operator, so you can
still use it, you don't have to make the decision every time, it's up to you how you implement
that in the loader and the loader knows what table it created so it's a good place to enforce
conformity, isn't it?


- Gunther


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18230/#review35132
-----------------------------------------------------------


On Feb. 21, 2014, 6:40 p.m., Sergey Shelukhin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18230/
> -----------------------------------------------------------
> 
> (Updated Feb. 21, 2014, 6:40 p.m.)
> 
> 
> Review request for hive, Gunther Hagleitner and Jitendra Pandey.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> See JIRA
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a182cd7 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java 24f1229 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 46e37c2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java c0f4cd7 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HashMapWrapper.java 61545b5

>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinKey.java 2ac0928 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinKeyBytes.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinKeyObject.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java 9ce0ae6

>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java
83ba0f0 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 47f9d21 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapperBatch.java 581046e

>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 2466a3b

>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java 997202f

>   ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinEqualityTableContainer.java
c541ad2 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinKey.java a103a51

>   ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinTableContainer.java
61c5741 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/Utilities.java 2cb1ac3 
> 
> Diff: https://reviews.apache.org/r/18230/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message