Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0AA164111 for ; Wed, 15 Jun 2011 18:59:42 +0000 (UTC) Received: (qmail 29659 invoked by uid 500); 15 Jun 2011 18:59:41 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 29627 invoked by uid 500); 15 Jun 2011 18:59:41 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 29616 invoked by uid 99); 15 Jun 2011 18:59:41 -0000 Received: from reviews.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jun 2011 18:59:41 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id DD16A1C00B7; Wed, 15 Jun 2011 18:59:43 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============3970225765255668636==" MIME-Version: 1.0 Subject: Re: Review Request: speedup addInputPaths From: "Ning Zhang" To: "Ning Zhang" , "Yongqiang He" , "hive" Date: Wed, 15 Jun 2011 18:59:43 -0000 Message-ID: <20110615185943.13642.96972@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org X-ReviewRequest-URL: https://reviews.apache.org/r/898/ In-Reply-To: <20110614210913.13642.74552@reviews.apache.org> References: <20110614210913.13642.74552@reviews.apache.org> --===============3970225765255668636== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/898/#review842 ----------------------------------------------------------- trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java This change will make the order of paths in pathProcessed non-determini= stic. This means mapred.input.dir will have not have the same order as befo= re. Not sure if it is safe or not, but if you change HashSet with LinkedHas= hSet, the order will be preserved. trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java Here needs some comments why this case doesn't need to check empty path= s. = = In terms of efficiency, it seems to me that checking empty paths is not= the most expensive part (# of RPCs is large but each listStatus() should b= e fast). Also we should be able to cache (needs to extend Utilities.isEmpty= ) the results of listStatus for each path, which are anyway needed in other= operations (compute splits etc). If = trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java It seem that we are doing redundant work as FileInputFormat.setInputPat= hs(JobConf, CommaSeparatedString). I think it would be safer and cleaner to= first get an array of paths and call: = = FileInputFormat.setInputPaths(StringUtils.stringToPath(String[] paths)) trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java indentation trunk/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java why do we need it here? trunk/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java ! mrwork.getPartDescToRework().isEmpty() trunk/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java The logic here is too complex and I think it's better to be refactored.= Is the following what you wanted? = = if (all_partitions_are_rework()) { prepareNullCombineFilter(combine); } else { prepareNormalCombineFilter(combine); } = InputSplitShim[] iss =3D combine.getSplits() trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java Does here just need a HashSet rather than a HashMap? - Ning On 2011-06-14 21:09:13, Yongqiang He wrote: > = > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/898/ > ----------------------------------------------------------- > = > (Updated 2011-06-14 21:09:13) > = > = > Review request for hive. > = > = > Summary > ------- > = > speedup addInputPaths > = > = > This addresses bug HIVE-2218. > https://issues.apache.org/jira/browse/HIVE-2218 > = > = > Diffs > ----- > = > trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 113533= 5 = > trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1135335 = > trunk/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.j= ava 1135335 = > trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 113533= 5 = > = > Diff: https://reviews.apache.org/r/898/diff > = > = > Testing > ------- > = > yes. > = > = > Thanks, > = > Yongqiang > = > --===============3970225765255668636==--