giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Avery Ching" <avery.ch...@gmail.com>
Subject Re: Review Request 15142: GIRAPH-789: Upgrade hive-io to 0.20 - less metastore accesses
Date Thu, 31 Oct 2013 20:07:45 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15142/#review27948
-----------------------------------------------------------

Ship it!


+1, this is awesome work Maja and will fail faster due to metastore issues and also cut back
on metastore accesses.  Yay!


giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java
<https://reviews.apache.org/r/15142/#comment54396>

    Maybe worth adding a top level comment for this method that says something like:
    For all Hive vertex inputs, add the user settings to the configuration.  Additionally,
this checks the input specs for every input which caches metadata access into the configuration
to eliminate worker access to the metastore and fail earlier in the case that metadata doesn't
exist.  In the case of multiple vertex input descriptions, metadata is cached in each vertex
input format description and then saved into a single Configuration via JSON.



giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java
<https://reviews.apache.org/r/15142/#comment54399>

    Maybe worth adding a top level comment for this method that says something like:
    For all Hive edge inputs, add the user settings to the configuration.  Additionally, this
checks the input specs for every input which caches metadata access into the configuration
to eliminate worker access to the metastore and fail earlier in the case that metadata doesn't
exist.  In the case of multiple edge input descriptions, metadata is cached in each vertex
input format description and then saved into a single Configuration via JSON.


- Avery Ching


On Oct. 31, 2013, 6:43 p.m., Maja Kabiljo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15142/
> -----------------------------------------------------------
> 
> (Updated Oct. 31, 2013, 6:43 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Bugs: GIRAPH-789
>     https://issues.apache.org/jira/browse/GIRAPH-789
> 
> 
> Repository: giraph-git
> 
> 
> Description
> -------
> 
> Currently each worker is sending multiple requests to metastore to get info about io
formats, which is unnecessary and can cause issues when metastore is having problems.
> 
> Hive-io changed so it doesn't access metastore when schema/table info is already present
in Configuration, and HiveGiraphRunner is now initializing all the formats to fill up the
Configuration. If HiveGiraphRunner is not used everything will still work, but we'll have
accesses to metastore from workers.
> 
> 
> Diffs
> -----
> 
>   giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java 6b8a8e9 
>   giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveUtils.java b809413 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java
534a773 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexInputFormat.java
d5c1279 
>   giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexOutputFormat.java
c4813fb 
>   pom.xml f2981ff 
> 
> Diff: https://reviews.apache.org/r/15142/diff/
> 
> 
> Testing
> -------
> 
> mvn clean verify
> 
> Run jobs with single and multiple input formats, with added logging for each metastore
call in hive-io. For example in case when we have single vertex and edge input and output,
we'll have none instead of 8 metastore calls from each worker. The number of calls from master
is also reduced - we are only getting input partition descriptions in the beginning of the
job and have no calls in the end (for output). The only call left in the end is from cleanup
task to register new partition. Clean up task used to have two additional calls which are
also removed.
> 
> 
> Thanks,
> 
> Maja Kabiljo
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message