impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dimitris Tsirogiannis (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-2840: Don't store table location in partition location
Date Tue, 15 Mar 2016 22:00:33 GMT
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-2840: Don't store table location in partition location
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/2355/4/fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
File fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java:

Line 204: public class Location
> I mentioned this in another comment, but it may have been obscured.
An HdfsPartition still has access to the parent table. So instead of accessing the compressor/decompressor
data structures directly, you'd be making a call to the corresponding HdfsTable api to store/get
prefixes. At the same time, Location is used for storing partition locations, so I feel it
naturally belongs under HdfsPartition. Another alternative would be to make it a generic util
class that is used to compress locations.


Line 237: // Combine with some random constants (from uuidgen -r) to make Locations with
        :       // identical 'suffix_'s but different 'prefix_'s hash to different values.
        :       final long m = 0xc6bfaf3a929b49e1L;
        :       final long a = 0xa9591152f59b46d7L;
        :       long long_prefix = prefix_;
        :       long_prefix = (long_prefix * m + a) >>> 32;
        :       int prefix_hash = (int)long_prefix;
        :       return suffix_.hashCode() ^ prefix_hash;
> simpler, but probably more expensive. Do you think it's worth it?
I would say yes because it simplifies the code. I don't believe the overhead of Location.hashCode
will have any noticeable impact on performance. Unless we prove otherwise, I'd go with code
simplicity :)


Line 275: What is left is the prefix.
> I thought that was what we had discussed? What do you think should happen i
Sorry if it wasn't clear. What I had in mind was detecting N key=value pairs in the location
that correspond to partition columns, otherwise falling back to using the entire location
as a suffix.


-- 
To view, visit http://gerrit.cloudera.org:8080/2355
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8c67b6ce0f83de2f5277a528a9ce67e47d638adb
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Jim Apple <jbapple@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbapple@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message