hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang" <xzh...@cloudera.com>
Subject Re: Review Request 18925: HIVE-6575 select * fails on parquet table with map datatype
Date Sun, 09 Mar 2014 01:12:36 GMT


> On March 8, 2014, 12:33 a.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java,
line 154
> > <https://reviews.apache.org/r/18925/diff/2/?file=513918#file513918line154>
> >
> >     I guess I wasn't clear. It's not inapproapriate, but goes beyond its responsibility.
Equality implementation is within a context which is the class. The instance to be checked
doesn't necessarily has the runtime class info. In fact, the context shouldn't be aware the
runtime class of these instances, as child classes can be added any time. Doing getClass ==
other.getClass() goes beyond the current context.
> >     
> >     What's more appropriate to check type compatibility by calling if (other instanceof
this.class). This is different from checking this.getClass() == other.getClass().
> >     
> >     Take Java ArrayList.equals() method as an example. (http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/AbstractList.java#AbstractList.equals%28java.lang.Object%29).
This method doesn't do runtime class check. The implementation is saying, other.getClass()
doesn't have to be ArrayList, but has to be an instance of ArrayList. It could be an instance
of MyArrayList as long as MyArrayList is inherited from ArrayList.
> >     
> >     If we think it's more protical to do such a check, we'd expect that ArrayList.equals()
would also check this.getClass() == other.getClass().
> >     
> >     Btw, I don't understand how it breaks transitivity by removing this check.
> >     
> >     I understand this check was there before your change. I missed it in my previous
review.
> >
> 
> Szehon Ho wrote:
>     Hm I actually did not realize that Java's code has that for collections, thanks for
pointing that out.  I suppose in list case, the semantic is the user doesn't care about list
implementation, but about the contents. 
>     
>     What I meant about breaking the transitive property if you allow each class to decide:
 Say we remove the check of RT class equality.  There is a subclass called 'A' which choose
to override equal to return true only if 'other' is A.  Another subclass 'B' doesn't override
.equals, and by inheritance can return true if 'other' is any subclass of parent (A or B).
 A.equals(B) is false, B.equals(A) is true, breaking transitive.  Now that I think about it,
this argument doesn't justify having the parent one way or another, all I meant is that a
class cannot implement .equals just in its own context as you mentioned, all subclass must
choose the same way to be consistent, and I felt that having this check at the parent would
ensure that all the children followed it.
>     
>     But coming back down to this particular issue, I still don't think its safe to remove
that check.  There are two subclass of AbstractParquetMapInspector, the Deep and Standard
one depending on the type of map.  If we don't do this check, then Deep will be considered
equal to Standard, and perhaps the wrong one may be returned from cache and used in the inspection,
they are not interchangeable.  This is unlike java list,map, here the actual class matters
more than the content.  At least that is my understanding looking at the code.

okay. Frankly, I don't know what's the difference between the two child class: the whole parquet
code is very confusing. Since the code was there before this, it's fine to keep it as it is.


- Xuefu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18925/#review36586
-----------------------------------------------------------


On March 8, 2014, 12:01 a.m., Szehon Ho wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18925/
> -----------------------------------------------------------
> 
> (Updated March 8, 2014, 12:01 a.m.)
> 
> 
> Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The issue is, as part of select * query, a DeepParquetHiveMapInspector is used for one
column of an overall parquet-table struct object inspector.  
> 
> The problem lies in the ObjectInspectorFactory's cache for struct object inspector. 
For performance, there is a cache keyed on an array list, of all object inspectors of columns.
 The second time the query is run, it attempts to lookup cached struct inspector.  But when
the hashmap looks up the part of the key consisting of the DeepParquetHiveMapInspector, java
calls .equals against the existing DeepParquetHivemapInspector.  This fails, as the .equals
method casted the "other" to a "StandardParquetHiveInspector".
> 
> Regenerating the .equals and .hashcode from eclipse.  
> 
> Also adding one more check in .equals before casting, to handle the case if another class
of object inspector gets hashed to the same hashcode in the cache.  Then java would call .equals
against the other, which in this case is not of the same class.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java
1d72747 
> 
> Diff: https://reviews.apache.org/r/18925/diff/
> 
> 
> Testing
> -------
> 
> Manual testing.
> 
> 
> Thanks,
> 
> Szehon Ho
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message