Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B18C8200BD0 for ; Wed, 30 Nov 2016 22:21:57 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id B03F6160B13; Wed, 30 Nov 2016 21:21:57 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id DEDD1160B06 for ; Wed, 30 Nov 2016 22:21:56 +0100 (CET) Received: (qmail 73502 invoked by uid 500); 30 Nov 2016 21:21:56 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 73491 invoked by uid 99); 30 Nov 2016 21:21:55 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Nov 2016 21:21:55 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 6D87AC1372 for ; Wed, 30 Nov 2016 21:21:55 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.363 X-Spam-Level: X-Spam-Status: No, score=0.363 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id UYWxTbhIr-vZ for ; Wed, 30 Nov 2016 21:21:54 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id EE5FF5F366 for ; Wed, 30 Nov 2016 21:21:53 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id uAULLa2v026316; Wed, 30 Nov 2016 21:21:36 GMT Message-Id: <201611302121.uAULLa2v026316@ip-10-146-233-104.ec2.internal> Date: Wed, 30 Nov 2016 21:21:36 +0000 From: "Bharath Vissapragada (Code Review)" To: impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Mostafa Mokhtar , Dimitris Tsirogiannis , Alex Behm Reply-To: bharathv@cloudera.com X-Gerrit-MessageType: comment Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-4172/IMPALA-3653=3A_Improvements_to_block_metadata_loading=0A?= X-Gerrit-Change-Id: Ie127658172e6e70dae441374530674a4ac9d5d26 X-Gerrit-ChangeURL: X-Gerrit-Commit: 539c101855fe5c1a150e7208012728a08cad166b In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.2 archived-at: Wed, 30 Nov 2016 21:21:57 -0000 Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-4172/IMPALA-3653: Improvements to block metadata loading ...................................................................... Patch Set 13: (13 comments) http://gerrit.cloudera.org:8080/#/c/5148/13/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java: PS13, Line 41: private DiskIdMapper() { : } > nit: single line Done PS13, Line 44: "storage ID UUID string" > nit: why quoted? Unquoted Line 55: * Returns a disk id (0-based) index for storageId on host 'host'. > This function doesn't only return a disk id (implying this is a getter), it Made the comment little detailed. PS13, Line 68: .intValue() > why do you need this? Won't this be unboxed automatically? Done PS13, Line 69: synchronized (storageIdGenerator) { > The common path now includes two calls to storageUuidToDiskId.get(), one sy Discussed in person. It was put outside the synchronized block so that callers with already mapped storage ids needn't enter the synchronized block. PS13, Line 74: stoprageUuid > typo: storageUuid Done http://gerrit.cloudera.org:8080/#/c/5148/13/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: PS13, Line 315: if (!FileSystemUtil.isChildPath(partDir, dirPath)) continue; > Is this check really needed? Haven't you done this check while populating p Currently we don't do it at the callers. So I added the check here. partsByPath includes all the partitions that need to be updated. However this method does only for descendants of dirPath. PS13, Line 316: perPartitionFileDescMap_ > If I recall correctly, I added this to speedup incremental loading of file Looking closely, yes this map doesn't make sense anymore. We are unnecessarily doing puts and gets into it. Removed.Nice catch btw. PS13, Line 431: > nit: extra space Done PS13, Line 441: // Synthesize the block metadata for the file descriptor. : long start = 0; : long remaining = fd.getFileLength(); : // Workaround HADOOP-11584 by using the filesystem default block size rather than : // the block size from the FileStatus. : // TODO: after HADOOP-11584 is resolved, get the block size from the FileStatus. : long blockSize = fs.getDefaultBlockSize(); : if (blockSize < MIN_SYNTHETIC_BLOCK_SIZE) blockSize = MIN_SYNTHETIC_BLOCK_SIZE; : Preconditions.checkState(partitions.size() > 0); : // For the purpose of synthesizing block metadata, we assume that all partitions : // with the same location have the same file format. : HdfsFileFormat fileFormat = partitions.get(0).getFileFormat(); : if (!fileFormat.isSplittable(HdfsCompression.fromFileName(fd.getFileName()))) { : blockSize = remaining; : } : while (remaining > 0) { : long len = Math.min(remaining, blockSize); : List replicas = Lists.newArrayList( : new BlockReplica(hostIndex_.getIndex(REMOTE_NETWORK_ADDRESS), false)); : fd.addFileBlock(new FileBlock(start, len, replicas)); : remaining -= len; : start += len; : } > I would put this in a separate function. Done PS13, Line 474: numHdfsFiles_++; : totalHdfsBytes_ += partition.getSize(); > Is this counting correct here? We increment them for every fd (and its corresponding partition). We seem to be implementing reload() as drop() + add(). dropPartition() is subtracting them when required. Also loadPartitionFileMetadata() subtracts when we reload a subset of partitions. Do you think anything could be done better here? I agree its confusing since its done in multiple places. http://gerrit.cloudera.org:8080/#/c/5148/13/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: PS13, Line 289: BlockLocation > Is this the name of the API call? If not, can you use the exact function na Rephrased it a little. Its no specific API call. Its just that only 'DistributedFileSystem's actually set storageIds in the BlockLocations and others don't. PS13, Line 418: child > child or descendant? Descendant. Updated the method name and callers. -- To view, visit http://gerrit.cloudera.org:8080/5148 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie127658172e6e70dae441374530674a4ac9d5d26 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: Yes