Return-Path: X-Original-To: apmail-giraph-dev-archive@www.apache.org Delivered-To: apmail-giraph-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E45EDDD10 for ; Fri, 15 Feb 2013 18:59:54 +0000 (UTC) Received: (qmail 97122 invoked by uid 500); 15 Feb 2013 18:59:54 -0000 Delivered-To: apmail-giraph-dev-archive@giraph.apache.org Received: (qmail 97068 invoked by uid 500); 15 Feb 2013 18:59:54 -0000 Mailing-List: contact dev-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@giraph.apache.org Delivered-To: mailing list dev@giraph.apache.org Received: (qmail 97054 invoked by uid 99); 15 Feb 2013 18:59:54 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Feb 2013 18:59:54 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 519EB1C76A0; Fri, 15 Feb 2013 18:59:47 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============1962907520270542629==" MIME-Version: 1.0 Subject: Re: Review Request: GIRAPH-453: Pure Hive I/O (nitay) From: "Nitay Joffe" To: "giraph" , "Nitay Joffe" , "Alessandro Presta" Date: Fri, 15 Feb 2013 18:59:47 -0000 Message-ID: <20130215185947.21380.43026@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org Auto-Submitted: auto-generated Sender: "Nitay Joffe" X-ReviewGroup: giraph X-ReviewRequest-URL: https://reviews.apache.org/r/8611/ X-Sender: "Nitay Joffe" References: <20121215020223.7759.74726@reviews.apache.org> In-Reply-To: <20121215020223.7759.74726@reviews.apache.org> Reply-To: "Nitay Joffe" --===============1962907520270542629== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/8611/ ----------------------------------------------------------- (Updated Feb. 15, 2013, 6:59 p.m.) Review request for giraph. Description ------- For now this is only the Input side of things. One particular thing I added= was the concept of "profiles", allowing for easily reading from multiple t= ables. This should remove a lot of the cruft around the GiraphHCat* classes. Note in the diff I separated the code so that there would be a Giraph-unrel= ated Hive-only portion (under package org.apache.hadoop.hive). Things under= this package (and its children) do not touch any Giraph code, and so can b= e contributed as an IOFormat back to Hive itself. Also note the new (I think improved) interface: Users do not need to actual= ly implement an XInputFormat anymore. They just create a class the implemen= ts the HiveVertexCreator interface, plug that in, and use HiveVertexInputFo= rmat. Should make user code much cleaner. This addresses bug GIRAPH-453. https://issues.apache.org/jira/browse/GIRAPH-453 Diffs (updated) ----- giraph-accumulo/pom.xml cb9fbc02e6fc8adcb0ec41e0c6aeff75b1ef3f06 = giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java 8= 9ef87fea7a370354156fb7be02ef4249e0a6111 = giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java= 9e129efebe39c42bab9d59b3246055b79cdbdfa3 = giraph-core/src/main/java/org/apache/giraph/utils/ConfigurationUtils.java= PRE-CREATION = giraph-hbase/pom.xml 7bbbd98c0b3db6878aee4be21eecd821448da7ef = giraph-hcatalog/pom.xml 4a8227295ca426cf273527cdf3c700d25c256ac2 = giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HCatGiraphRun= ner.java PRE-CREATION = giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRun= ner.java fbcef720d3caa944af70a859996aac40a2f67558 = giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveUtils.jav= a c1f76f1a46d1fc9af489a916256884520c138cb4 = giraph-hive/pom.xml PRE-CREATION = giraph-hive/src/main/assembly/compile.xml PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java PR= E-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java= PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveUtils.java PR= E-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInput= Format.java PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeReade= r.java PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveToEdge.ja= va PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/package-info.= java PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/input/package-info.java = PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveToVerte= x.java PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexI= nputFormat.java PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexR= eader.java PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/package-inf= o.java PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexOutputF= ormat.java PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexWriter.= java PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/output/VertexToHive.java= PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/output/package-info.java= PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/package-info.java PRE-CR= EATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveReadableRecord.j= ava PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveRecord.java PRE-= CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchema.java= PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchemaAware= .java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchemas.jav= a PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveWritableRecord.j= ava PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/HiveApiRecord.j= ava PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/HiveApiTableSch= ema.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Classes.= java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/FileSyst= ems.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HadoopUt= ils.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HiveMeta= stores.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HiveUtil= s.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Inspecto= rs.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Progress= Reporter.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/SerDes.j= ava PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Writable= s.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/package-= info.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/HiveApiIn= putSplit.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/HiveApiRe= cordReader.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputConf= .java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputInfo= .java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputPart= ition.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputSpli= tData.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/NoOpInput= Observer.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark= /BenchmarkArgs.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark= /CounterRatioGauge.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark= /InputBenchmark.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark= /MetricsObserver.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark= /package-info.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/package-i= nfo.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/HiveApiO= utputCommitter.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/HiveApiR= ecordWriter.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/NoOpOutp= utObserver.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/OutputCo= nf.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/OutputIn= fo.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/package-= info.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/package-info.ja= va PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveApiInputFo= rmat.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveApiInputOb= server.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveInputDescr= iption.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/package-info.j= ava PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveApiOutput= Format.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveApiOutput= Observer.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveOutputDes= cription.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/package-info.= java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/package-info.java PR= E-CREATION = pom.xml f6e9302d694dab9a075de11ad00e6dcfc878e400 = Diff: https://reviews.apache.org/r/8611/diff/ Testing ------- Ran on some production jobs and verified results were exactly the same. In terms of performance this is on par with our current HCatalog stuff. I r= an a few jobs and noticed at most a few seconds of difference between the i= nput supersteps. Sometimes it was less, so I think the difference is mostly= noise. Thanks, Nitay Joffe --===============1962907520270542629==--