Return-Path: X-Original-To: apmail-giraph-dev-archive@www.apache.org Delivered-To: apmail-giraph-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2F6E8DD35 for ; Fri, 15 Feb 2013 19:01:26 +0000 (UTC) Received: (qmail 3778 invoked by uid 500); 15 Feb 2013 19:01:26 -0000 Delivered-To: apmail-giraph-dev-archive@giraph.apache.org Received: (qmail 3733 invoked by uid 500); 15 Feb 2013 19:01:26 -0000 Mailing-List: contact dev-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@giraph.apache.org Delivered-To: mailing list dev@giraph.apache.org Received: (qmail 3718 invoked by uid 99); 15 Feb 2013 19:01:26 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Feb 2013 19:01:25 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id A6FD71C76A0; Fri, 15 Feb 2013 19:01:18 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============0675583926558679914==" MIME-Version: 1.0 Subject: Re: Review Request: GIRAPH-453: Pure Hive I/O (nitay) From: "Nitay Joffe" To: "giraph" , "Nitay Joffe" , "Alessandro Presta" Date: Fri, 15 Feb 2013 19:01:18 -0000 Message-ID: <20130215190118.9635.57379@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org Auto-Submitted: auto-generated Sender: "Nitay Joffe" X-ReviewGroup: giraph X-ReviewRequest-URL: https://reviews.apache.org/r/8611/ X-Sender: "Nitay Joffe" References: <20130215190002.21380.1979@reviews.apache.org> In-Reply-To: <20130215190002.21380.1979@reviews.apache.org> Reply-To: "Nitay Joffe" --===============0675583926558679914== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/8611/ ----------------------------------------------------------- (Updated Feb. 15, 2013, 7:01 p.m.) Review request for giraph. Description (updated) ------- One particular thing I added was the concept of "profiles", allowing for ea= sily reading / writing from multiple tables. This should remove a lot of th= e cruft around the GiraphHCat* classes. Note in the diff I separated the code so that there would be a Giraph-unrel= ated Hive-only portion (under package org.apache.hadoop.hive). Things under= this package (and its children) do not touch any Giraph code, and so can b= e contributed as an IOFormat back to Hive itself. Also note the new (I think improved) interface: Users do not need to actual= ly implement an XInputFormat anymore. They just create a class the implemen= ts the HiveToVertex (HiveToEdge, VertexToHive) interface, plug that in, and= use HiveVertexInputFormat. Should make user code much cleaner. This addresses bug GIRAPH-453. https://issues.apache.org/jira/browse/GIRAPH-453 Diffs ----- giraph-accumulo/pom.xml cb9fbc02e6fc8adcb0ec41e0c6aeff75b1ef3f06 = giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java 8= 9ef87fea7a370354156fb7be02ef4249e0a6111 = giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java= 9e129efebe39c42bab9d59b3246055b79cdbdfa3 = giraph-core/src/main/java/org/apache/giraph/utils/ConfigurationUtils.java= PRE-CREATION = giraph-hbase/pom.xml 7bbbd98c0b3db6878aee4be21eecd821448da7ef = giraph-hcatalog/pom.xml 4a8227295ca426cf273527cdf3c700d25c256ac2 = giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HCatGiraphRun= ner.java PRE-CREATION = giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRun= ner.java fbcef720d3caa944af70a859996aac40a2f67558 = giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveUtils.jav= a c1f76f1a46d1fc9af489a916256884520c138cb4 = giraph-hive/pom.xml PRE-CREATION = giraph-hive/src/main/assembly/compile.xml PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java PR= E-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java= PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveUtils.java PR= E-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInput= Format.java PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeReade= r.java PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveToEdge.ja= va PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/package-info.= java PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/input/package-info.java = PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveToVerte= x.java PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexI= nputFormat.java PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexR= eader.java PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/package-inf= o.java PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexOutputF= ormat.java PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexWriter.= java PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/output/VertexToHive.java= PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/output/package-info.java= PRE-CREATION = giraph-hive/src/main/java/org/apache/giraph/hive/package-info.java PRE-CR= EATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveReadableRecord.j= ava PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveRecord.java PRE-= CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchema.java= PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchemaAware= .java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchemas.jav= a PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveWritableRecord.j= ava PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/HiveApiRecord.j= ava PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/HiveApiTableSch= ema.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Classes.= java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/FileSyst= ems.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HadoopUt= ils.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HiveMeta= stores.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HiveUtil= s.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Inspecto= rs.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Progress= Reporter.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/SerDes.j= ava PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Writable= s.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/package-= info.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/HiveApiIn= putSplit.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/HiveApiRe= cordReader.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputConf= .java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputInfo= .java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputPart= ition.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputSpli= tData.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/NoOpInput= Observer.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark= /BenchmarkArgs.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark= /CounterRatioGauge.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark= /InputBenchmark.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark= /MetricsObserver.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark= /package-info.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/package-i= nfo.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/HiveApiO= utputCommitter.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/HiveApiR= ecordWriter.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/NoOpOutp= utObserver.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/OutputCo= nf.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/OutputIn= fo.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/package-= info.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/package-info.ja= va PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveApiInputFo= rmat.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveApiInputOb= server.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveInputDescr= iption.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/package-info.j= ava PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveApiOutput= Format.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveApiOutput= Observer.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveOutputDes= cription.java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/package-info.= java PRE-CREATION = giraph-hive/src/main/java/org/apache/hadoop/hive/api/package-info.java PR= E-CREATION = pom.xml f6e9302d694dab9a075de11ad00e6dcfc878e400 = Diff: https://reviews.apache.org/r/8611/diff/ Testing ------- Ran on some production jobs and verified results were exactly the same. In terms of performance this is on par with our current HCatalog stuff. I r= an a few jobs and noticed at most a few seconds of difference between the i= nput supersteps. Sometimes it was less, so I think the difference is mostly= noise. Thanks, Nitay Joffe --===============0675583926558679914==--