hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suraj Menon <menonsur...@gmail.com>
Subject Re: Re: Loading binary file in Hama (with graph API)
Date Sat, 04 May 2013 00:43:45 GMT
Hi Jiwon,

In 0.6.0, Hama supported all input formats in loadVertices. In 0.6.1, new
partitioner limited the capability of GraphJobRunner to run only on
sequential file format. Record converter was (painfully) implemented to
bridge this gap. This would be eliminated once we have a better
partitioning design. Today, the VertexInputReader#parseVertex is called
inside VertexInputReader#convertRecord. VertexInputReader is a
RecordConverter.  So, this reads vertices in your format and feeds the
GraphJobRunner the vertices written in sequential file format. It does
duplicate the data for the first time. You can reuse the partitions created
in the next job submission without any converter/VertexInputReader.

Hi Ikhtiyor,

Sounds like a good plan. Please create a JIRA issue and suggest/implement
some more code refactoring for the purpose.

Hi Edward,
No one likes this stop-gap solution. :)

Regards,
Suraj





On Fri, May 3, 2013 at 7:59 PM, Ikhtiyor Ahmedov <ikhtiyor.ahmedov@gmail.com
> wrote:

> As a user would like to add, maybe considering Apache Gora is good solution
> for integrating with NoSQLs
> On May 4, 2013 8:47 AM, "Edward J. Yoon" <edwardyoon@apache.org> wrote:
>
> > PartitioningRunner rewrites (converted to VertexWritable) records to
> > particular partition files. and then, GraphJobRunner reads just
> > VertexWritable.
> >
> > To Hama devs,
> >
> > BTW, I hadn't really thought about 'Range Partitioning' and
> > 'integration with NoSQLs' until just now. And I just found my old
> > opinion[1] on record converter. I didn't like 'Record converter'.
> >
> > 1. http://markmail.org/message/ol32pp2ixfazcxfc
> >
> > On Sat, May 4, 2013 at 7:36 AM, Jiwon Seo <jiwon@stanford.edu> wrote:
> > > Edward, thanks for your reply.
> > >
> > > Right, I checked that PartitioningRunner is the only place that calls
> the
> > > convertRecord method.
> > >
> > > However, it is not clear how that class is related to the
> GraphJobRunner
> > > class.
> > > The loadVertices() method in the GraphJobRunner does not call the
> > > convertRecord method as in PartitioningRunner::bsp().
> > >
> > > Is the GraphJobRunner::loadVertices() not used for loading vertices?
> > > If it is used, how is it related to PartitioningRunner::bsp()? It would
> > be
> > > helpful to know the (rough) call stack from PartitioningRunner to
> > > GraphJobRunner (or vice versa).
> > >
> > > Thanks,
> > >
> > > -Jiwon
> > >
> > >> Hi Mr.Seo,
> > >>
> > >> Please look at VertexInputReader.convertRecord() method. see also
> > >> PartitioningRunner and RecordConverter classes[1].
> > >>
> > >> 1.
> > >
> >
> http://svn.apache.org/repos/asf/hama/trunk/core/src/main/java/org/apache/hama/bsp/PartitioningRunner
> > >>
> > >>On Fri, May 3, 2013 at 5:49 PM, Jiwon Seo <jiwon@stanford.edu> wrote:
> > >>> Hi,
> > >>>
> > >>> I'm trying to understand how vertex loading is done in hama.
> > >>>
> > >>> The part that I don't understand is, the relation between
> > > VertexInputReader
> > >>> and InputFormat.
> > >>>
> > >>> As far as I understand, VertexInputReader.parseVertex is the method
> to
> > >>> initialize each vertex, but it is not clear where the function is
> > called
> > > in
> > >>> Hama 0.6.1.
> > >>> In Hama 0.6.0, the parseVertex function is explicitly called inside
> > >>> GraphJobRunner::loadVertices, but in Hama 0.6.1, it is replaced with
> > >>> peer.readNext(vertex, NullWritable.get()), and parseVertex does not
> > seem
> > > to
> > >>> get called. Where is the function called?
> > >>>
> > >>> Thanks,
> > >>>
> > >>> -Jiwon
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message