impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4789: Fix slow metadata loading due to inconsistent paths.
Date Thu, 26 Jan 2017 23:36:01 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-4789: Fix slow metadata loading due to inconsistent paths.
......................................................................


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/5743/1//COMMIT_MSG
Commit Message:

PS1, Line 22: The treatment of such
            : partitions is very inefficient.
> I could be wrong about it, but I don't think this is an accurate descriptio
You are exactly right. Modified comment.


http://gerrit.cloudera.org:8080/#/c/5743/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 263
> Oh never mind, i didn't read the change description carefully enough.
Glad to hear we arrived at the same fix :)


Line 266:         // unpartitioned table, or the path of a partition with a custom location.
> how does the 'custom location' part follow?
You are right, it does not always follow. Rephrased.


PS1, Line 698: // TODO: We can still do some advanced optimization by grouping all the partition
             :     // directories under the same ancestor path up the tree.
             :     List<Path> dirsToLoad = Lists.newArrayList(tblLocation);
> We already have the ability to compress partition paths by removing the com
1. This TODO refers to partitions with custom locations specifically. Those will never go
through L274-280, but always through L267-269.

2. I'm not even convinced this optimization is easy/practical. At least it's complicated enough
to investigate in a different JIRA/patch. For example, consider something like this:

HDFS paths:
/warehouse/staging/2017/a
/warehouse/staging/2017/b
/warehouse/staging/2017/c
/warehouse/staging/2016/a
/warehouse/staging/2016/b
/warehouse/staging/2016/c

Now let's way we have two partitions with these custom locations:
/warehouse/staging/2017/a
/warehouse/staging/2016/a

Their common ancestor is:
/warehouse/staging/

We will load all files under the common ancestor from the NN. Might be good or might be a
disaster.

It doesn't sound trivial to make this optimization robust.


Line 760:       partsByPath.put(partDirPath, Lists.newArrayList(partition));
> do we know that on this path it will be a qualified path, or do we not care
Good point. As long as the qualification is consistent, we're ok. I'm inclined to leave it
as is because qualification
complicates the code due to additional exception handling in various places (qualification
can throw IOException). What do you think?


http://gerrit.cloudera.org:8080/#/c/5743/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

Line 429:    * be qualified or unqualified paths for this function to behave as expected.
> add a precondition that enforces this?
I'm a little worried about that for the following reasons:

1. If there is still a bug lurking somewhere this may lead to failed table loads (Preconditions
check will throw), instead of table loads possibly being slow.

2. The Preconditions check in itself is pretty expensive (see the new code).

3. If something with the path qualification is going wrong, then the modified log message
in HdfsTable.loadMetadataAndDiskIds() will give us a good indication of that.

So instead of a Preconditions check I added a LOG message. How does that sound?


Line 433:     while (!p.isRoot() && p.depth() != parent.depth()) p = p.getParent();
> Now that we are here, should we consider fixing this method to make it more
I'd prefer to leave that as follow-on work because it requires additional investigation. Based
on Mostafa's
profiling, we don't have evidence to indicate this function is a bottleneck, so why fix it
if it's not broken?


-- 
To view, visit http://gerrit.cloudera.org:8080/5743
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8c881b7cb155032b82fba0e29350ca31de388d55
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message