giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-214) GiraphJob should have configuration split out of it to be cleaner (GiraphConf)
Date Tue, 14 Aug 2012 16:30:38 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434235#comment-13434235
] 

Eli Reisman commented on GIRAPH-214:
------------------------------------

I too like the idea of a single repository of internal values since its easy for new folks
to review the cmd-line options in one place, although perhaps having domain-specific comments
replace the in-code definitions so people who look to that code to find them or how they operate
can be pointed to GiraphConf as well.

There is a bigger problem. Its used all over the place, leaving our side from GiraphJob and
coming back (as a copy of the Configuration or GiraphConf we put in, not an original) wrapped
in a org.apache.hadoop.mapred.JobConf inside the Mapper.Context passed to GraphMapper and
spread around EVERYWHERE from there via said Context (sometimes in the form of a JobContext.)
This is where the inheritance gets ugly:

The class hierarchy coming out of Hadoop is Configuration (no interface to route calls from,
just a class) and the JobConf inherits from that. When I attempt to make GiraphConf inherit
from JobConf so I can do the cast when we get the JobConf, it compiles but (as you see in
option 1) does not make it past the runtime and the tests. I was hoping this was because I
picked the wrong JobConf (there's .mapred, .mapreduce, .mappred_v2 stuff in the Hadoop code)
but there's only one JobConfImpl I see in the Hadoop code coming out of .mapred just as GiraphConf
does. So for some reason it won't cast.

If I try to wrap the JobConf rather than inherit, I have to constantly allow "raw access"
to the Conf through a method such as getRawConf() which crops up anytime a user or anyone
else does requires raw access to things like getInt() setClass() etc for application-specific
stuff that should not be defined internally. This seems to add confusion at the user level
which started to get frustrating and/or defeat the purpose of keeping users and some plumbing
out of the raw access and keep them using getters/setters. I could add an input arg to give
the wrapped instance directly to the places where GraphMapper hands it out to all the other
components inside of a JobContext or Mapper.Context, but if I let those components get it
as they do now (or in the case of some of the IO stuff if they too get it from Hadoop) then
we have to create a bunch of GiraphConf wrappers on every node, and that seems wasteful too.

I could certainly attempt this if you like, this was option 2. Ideally, I'd like to figure
out why it won't let us just inherit JobConf. I think it's because its a copy of the "raw
Configuration" wrapped in JobConf when it comes back to us, and not our original GiraphConf
any more, but then why can't I copy on the way back out of Hadoop when my GiraphConf inherits
from JobConf and ought to have that ability (also sets off the compiler or tests.)

I will play with it more today but its very puzzling. And it does compile without complaint
as is!

So basically: group vote on central config options in general (Avery is a strong 'yes' and
I'm a bit on the fence but more 'yes' than 'no', Jakob's a strong 'no', others say...?) and
calling all Hadoop masters -- whats up with the JobConf?!?

                
> GiraphJob should have configuration split out of it to be cleaner (GiraphConf)
> ------------------------------------------------------------------------------
>
>                 Key: GIRAPH-214
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-214
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Avery Ching
>            Assignee: Eli Reisman
>            Priority: Minor
>         Attachments: GIRAPH-214-1.patch, GIRAPH-214-2.patch, GIRAPH-214-3.patch, GIRAPH-214-4.patch,
GIRAPH-214-5-option1.patch, GIRAPH-214-6-option1.patch
>
>
> Currently all the configuration for Giraph is part of GiraphJob, making things messy
for GiraphJob.
> It would be better if we added a GiraphConf (similar to Hive) that is responsible for
handling configuration of the Job.
> i.e.
> public class GiraphJob extends Configuration....
> To simplify config, we should make get/set methods for as many of the parameters as possible.
> We are targeting configuration such as
>   /**
>    * Set the vertex class (required)
>    *
>    * @param vertexClass Runs vertex computation
>    */
>   public final void setVertexClass(Class<?> vertexClass) {
>     getConfiguration().setClass(VERTEX_CLASS, vertexClass, BasicVertex.class);
>   }
>   /**
>    * Set the vertex input format class (required)
>    *
>    * @param vertexInputFormatClass Determines how graph is input
>    */
>   public final void setVertexInputFormatClass(
>       Class<?> vertexInputFormatClass) {
>     getConfiguration().setClass(VERTEX_INPUT_FORMAT_CLASS,
>         vertexInputFormatClass,
>         VertexInputFormat.class);
>   }
>   /**
>    * Set the vertex output format class (optional)
>    *
>    * @param vertexOutputFormatClass Determines how graph is output
>    */
>   public final void setVertexOutputFormatClass(
>       Class<?> vertexOutputFormatClass) {
>     getConfiguration().setClass(VERTEX_OUTPUT_FORMAT_CLASS,
>         vertexOutputFormatClass,
>         VertexOutputFormat.class);
>   }
>   /**
>    * Set the vertex combiner class (optional)
>    *
>    * @param vertexCombinerClass Determines how vertex messages are combined
>    */
>   public final void setVertexCombinerClass(Class<?> vertexCombinerClass) {
>     getConfiguration().setClass(VERTEX_COMBINER_CLASS,
>         vertexCombinerClass,
>         VertexCombiner.class);
>   }
>   /**
>    * Set the graph partitioner class (optional)
>    *
>    * @param graphPartitionerFactoryClass Determines how the graph is partitioned
>    */
>   public final void setGraphPartitionerFactoryClass(
>       Class<?> graphPartitionerFactoryClass) {
>     getConfiguration().setClass(GRAPH_PARTITIONER_FACTORY_CLASS,
>         graphPartitionerFactoryClass,
>         GraphPartitionerFactory.class);
>   }
>   /**
>    * Set the vertex resolver class (optional)
>    *
>    * @param vertexResolverClass Determines how vertex mutations are resolved
>    */
>   public final void setVertexResolverClass(Class<?> vertexResolverClass) {
>     getConfiguration().setClass(VERTEX_RESOLVER_CLASS,
>         vertexResolverClass,
>         VertexResolver.class);
>   }
>   /**
>    * Set the worker context class (optional)
>    *
>    * @param workerContextClass Determines what code is executed on a each
>    *        worker before and after each superstep and computation
>    */
>   public final void setWorkerContextClass(Class<?> workerContextClass) {
>     getConfiguration().setClass(WORKER_CONTEXT_CLASS,
>         workerContextClass,
>         WorkerContext.class);
>   }
> ...etc. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message