giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nitay Joffe" <ni...@apache.org>
Subject Re: Review Request: GIRAPH-453: Pure Hive I/O (nitay)
Date Fri, 15 Feb 2013 18:59:47 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8611/
-----------------------------------------------------------

(Updated Feb. 15, 2013, 6:59 p.m.)


Review request for giraph.


Description
-------

For now this is only the Input side of things. One particular thing I added was the concept
of "profiles", allowing for easily reading from multiple tables. This should remove a lot
of the cruft around the GiraphHCat* classes.

Note in the diff I separated the code so that there would be a Giraph-unrelated Hive-only
portion (under package org.apache.hadoop.hive). Things under this package (and its children)
do not touch any Giraph code, and so can be contributed as an IOFormat back to Hive itself.

Also note the new (I think improved) interface: Users do not need to actually implement an
XInputFormat anymore. They just create a class the implements the HiveVertexCreator interface,
plug that in, and use HiveVertexInputFormat. Should make user code much cleaner.


This addresses bug GIRAPH-453.
    https://issues.apache.org/jira/browse/GIRAPH-453


Diffs (updated)
-----

  giraph-accumulo/pom.xml cb9fbc02e6fc8adcb0ec41e0c6aeff75b1ef3f06 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java 89ef87fea7a370354156fb7be02ef4249e0a6111

  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 9e129efebe39c42bab9d59b3246055b79cdbdfa3

  giraph-core/src/main/java/org/apache/giraph/utils/ConfigurationUtils.java PRE-CREATION 
  giraph-hbase/pom.xml 7bbbd98c0b3db6878aee4be21eecd821448da7ef 
  giraph-hcatalog/pom.xml 4a8227295ca426cf273527cdf3c700d25c256ac2 
  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HCatGiraphRunner.java PRE-CREATION

  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRunner.java fbcef720d3caa944af70a859996aac40a2f67558

  giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveUtils.java c1f76f1a46d1fc9af489a916256884520c138cb4

  giraph-hive/pom.xml PRE-CREATION 
  giraph-hive/src/main/assembly/compile.xml PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveUtils.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeReader.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveToEdge.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/package-info.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/giraph/hive/input/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveToVertex.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexInputFormat.java
PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexReader.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/package-info.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexOutputFormat.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexWriter.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/giraph/hive/output/VertexToHive.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/output/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/giraph/hive/package-info.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveReadableRecord.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveRecord.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchema.java PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchemaAware.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveTableSchemas.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveWritableRecord.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/HiveApiRecord.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/HiveApiTableSchema.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Classes.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/FileSystems.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HadoopUtils.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HiveMetastores.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/HiveUtils.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Inspectors.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/ProgressReporter.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/SerDes.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/Writables.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/common/package-info.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/HiveApiInputSplit.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/HiveApiRecordReader.java
PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputConf.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputInfo.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputPartition.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/InputSplitData.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/NoOpInputObserver.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/BenchmarkArgs.java
PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/CounterRatioGauge.java
PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/InputBenchmark.java
PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/MetricsObserver.java
PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/benchmark/package-info.java
PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/input/package-info.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/HiveApiOutputCommitter.java
PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/HiveApiRecordWriter.java
PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/NoOpOutputObserver.java
PRE-CREATION 
  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/OutputConf.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/OutputInfo.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/output/package-info.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/impl/package-info.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveApiInputFormat.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveApiInputObserver.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/HiveInputDescription.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/package-info.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveApiOutputFormat.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveApiOutputObserver.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/HiveOutputDescription.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/output/package-info.java PRE-CREATION

  giraph-hive/src/main/java/org/apache/hadoop/hive/api/package-info.java PRE-CREATION 
  pom.xml f6e9302d694dab9a075de11ad00e6dcfc878e400 

Diff: https://reviews.apache.org/r/8611/diff/


Testing
-------

Ran on some production jobs and verified results were exactly the same.

In terms of performance this is on par with our current HCatalog stuff. I ran a few jobs and
noticed at most a few seconds of difference between the input supersteps. Sometimes it was
less, so I think the difference is mostly noise.


Thanks,

Nitay Joffe


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message