hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suraj Menon <surajsme...@apache.org>
Subject Re: runtimePartitioning in GraphJobRunner
Date Mon, 10 Dec 2012 19:58:10 GMT
Hi Edward, I am assuming that you want to do this because you want to run
the job using more BSP tasks in parallel to reduce the memory usage per
task and perhaps run it faster.
Am I right? I am +1 if this makes things faster. However this would be
expensive for people with smaller clusters, and we should have spill, cache
and lookup implemented for Vertices in such cases.

Regarding backward compatibility, can we use the user's VertexInputReader
to read the data and then write them in sequential file format we wan't. I
was discussing this with Thomas and we felt this could be done by
configuring a default input reader and overriding the same by
configuration. We would have to make the Vertex class Writable. I would
like to keep it backward compatible. Is this a possibility?

Regarding run-time partitioning, not all partitioning would be based on
hash partitioning. I can have a partitioner based on color of the vertex or
some other property of the vertex. It is a step we can skip if not
configured by user.

Just my 2 cents. We can deprecate things but let's not remove immediately.

-Suraj

HAMA-632 can wait until everything is resolved. I am trying to reduce the
API complexity.

On Mon, Dec 10, 2012 at 2:56 PM, Thomas Jungblut
<thomas.jungblut@gmail.com>wrote:

> You didn't get the use of the reader.
> The reader doesn't care about the input format.
> It just takes the input as Writable, so for Text this is LongWritable/Text
> pairs. For NoSQL this might be LongWritable/BytesWritable.
>
> It's up to you coding this for your input sequence, not for each format.
> This is not hardcoded to text, only in the examples.
>
> 2012/12/10 Edward J. Yoon <edwardyoon@apache.org>
>
> > Again ... User can create their own InputFormatter to read records as
> > a <Writable, ArrayWritable> from text file or sequence file, or
> > NoSQLs.
> >
> > You can use K, V pairs and sequence file. Why do you want to use text
> > file? Should I always write text file and parse them using
> > VertexInputReader?
> >
> >
> > On Tue, Dec 11, 2012 at 4:48 AM, Thomas Jungblut
> > <thomas.jungblut@gmail.com> wrote:
> > >>
> > >> It's a gap in experience, Thomas.
> > >
> > >
> > > Most probably you should read some good books on data extraction and
> then
> > > choose your tools accordingly.
> > > I never think that BSP is and will be a good extraction technique for
> > > unstructured data.
> > >
> > > But these are just my two cents here- there seems to be somewhat more
> > > political problems in this game than using tools appropriately.
> > >
> > > 2012/12/10 Thomas Jungblut <thomas.jungblut@gmail.com>
> > >
> > >> Yes, if you preprocess your data correctly.
> > >> I have done the same unstructured extraction with the movie database
> > from
> > >> IMDB and it worked fine.
> > >> That's just not a job for BSP, but for MapReduce.
> > >>
> > >> 2012/12/10 Edward J. Yoon <edwardyoon@apache.org>
> > >>
> > >>> It's a gap in experience, Thomas. Do you think you can extract
> Twitter
> > >>>
> > >>> mention graph using parseVertex?
> > >>>
> > >>> On Tue, Dec 11, 2012 at 4:34 AM, Thomas Jungblut
> > >>> <thomas.jungblut@gmail.com> wrote:
> > >>> > I have trouble understanding you here.
> > >>> >
> > >>> > How can I generate large sample without coding?
> > >>> >
> > >>> >
> > >>> > Do you mean random data generation or real-life data?
> > >>> > Personally I think it is really convenient to transform
> unstructured
> > >>> data
> > >>> > in a text file to vertices.
> > >>> >
> > >>> >
> > >>> > 2012/12/10 Edward <edward@udanax.org>
> > >>> >
> > >>> >> I mean, With or without input reader. How can I generate large
> > sample
> > >>> >> without coding?
> > >>> >>
> > >>> >> It's unnecessary feature. As I mentioned before, only good
for
> > simple
> > >>> and
> > >>> >> small test.
> > >>> >>
> > >>> >> Sent from my iPhone
> > >>> >>
> > >>> >> On Dec 11, 2012, at 3:38 AM, Thomas Jungblut <
> > >>> thomas.jungblut@gmail.com>
> > >>> >> wrote:
> > >>> >>
> > >>> >> >>
> > >>> >> >> In my case, generating test data is very annoying.
> > >>> >> >
> > >>> >> >
> > >>> >> > Really? What is so difficult to generate tab separated
text
> > data?;)
> > >>> >> > I think we shouldn't do this, but there seems to be very
little
> > >>> interest
> > >>> >> in
> > >>> >> > the community so I will not block your work on it.
> > >>> >> >
> > >>> >> > Good luck ;)
> > >>> >>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Best Regards, Edward J. Yoon
> > >>> @eddieyoon
> > >>>
> > >>
> > >>
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message