giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Morgan <pmorgan...@gmail.com>
Subject Several Queries
Date Sat, 02 Mar 2013 12:31:33 GMT
Hi,

I have several questions/comments relating to various parts of Giraph. Not
sure whether they should be here or in the dev mailing list. They might
have already been answered somewhere.

1) When is there likely to be a new release of Giraph, its been a while
since the last one and I know there has been a lot of work on it?

2) I've been using the AccumuloVertexInputFormat, and not found it very
useful - it is most likely that graph data held in Accumulo is an edge
list, so would it not be better that the Accumulo Input Format for Giraph
extends EdgeInputFormat instead? The current example for using with
Accumulo is called AccumuloEdgeInputFormat (which extends
AccumuloVertexInputFormat) - that seems to be badly named? I was also
wondering what the behaviour is when using a VertexInputFormat but with an
edge list - do vertices created later in the input override previous ones,
and so edges get lost/ not added?

3) Is there a way to ensure that if the input file doesn't exist, the
giraph job (and hence MR job) will exit, and not hang?

4) The SequenceFileReader currently assumes that the Key type is going to
be the same as the Vertex name type, but in my case this isn't true. Is
there, or can there be a version which allows the Sequence file key and
value be different to the giraph vertex name, state, edge and message
types.

5) I've been having a problem with a Giraph job successfully finishing, but
the process/jvm on each compute node not being killed properly and it just
sits there idle, but keeping the RAM that the job has used. Doesn't seem to
be a problem when we run normal MR. We then have to manually kill each
process to release the RAM. Any ideas why this might be happening?

Thanks in advance for any help.
Peter

Mime
View raw message