hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: runtimePartitioning in GraphJobRunner
Date Mon, 10 Dec 2012 22:21:40 GMT
> Please do me a favor a code how you want the partitioning BSP job to work
> before removing everything. I will tell you how to use the readers without
> any graph duplicate code so you don't need to touch the examples at all.

You don't need to wait. Because it will be almost same with
BSPJobClient.partition() method.

On Tue, Dec 11, 2012 at 6:59 AM, Thomas Jungblut
<thomas.jungblut@gmail.com> wrote:
> Please do me a favor a code how you want the partitioning BSP job to work
> before removing everything. I will tell you how to use the readers without
> any graph duplicate code so you don't need to touch the examples at all.
>
> 2012/12/10 Edward J. Yoon <edwardyoon@apache.org>
>
>> Please review
>> https://issues.apache.org/jira/secure/attachment/12560155/patch_v02.txt
>> first.
>>
>> * If we have VertexInputReader again, we don't need to apply it to all
>> examples. And, random generators and examples should be managed
>> together now.
>>
>> On Tue, Dec 11, 2012 at 6:52 AM, Thomas Jungblut
>> <thomas.jungblut@gmail.com> wrote:
>> > Yes, but in patches and in Issue Hama-531, so we can review.
>> >
>> > 2012/12/10 Edward J. Yoon <edwardyoon@apache.org>
>> >
>> >> We talked on gtalk, the conclusion is as below:
>> >>
>> >> "If there's no opinion, I'll remove VertexInputReader in
>> >> GraphJobRunner, because it make code complex. Let's consider again
>> >> about the VertexInputReader, after fixing HAMA-531 and HAMA-632
>> >> issues."
>> >>
>> >> I'll clean up them tomorrow.
>> >>
>> >> On Tue, Dec 11, 2012 at 4:58 AM, Suraj Menon <surajsmenon@apache.org>
>> >> wrote:
>> >> > Hi Edward, I am assuming that you want to do this because you want
to
>> run
>> >> > the job using more BSP tasks in parallel to reduce the memory usage
>> per
>> >> > task and perhaps run it faster.
>> >> > Am I right? I am +1 if this makes things faster. However this would
be
>> >> > expensive for people with smaller clusters, and we should have spill,
>> >> cache
>> >> > and lookup implemented for Vertices in such cases.
>> >> >
>> >> > Regarding backward compatibility, can we use the user's
>> VertexInputReader
>> >> > to read the data and then write them in sequential file format we
>> wan't.
>> >> I
>> >> > was discussing this with Thomas and we felt this could be done by
>> >> > configuring a default input reader and overriding the same by
>> >> > configuration. We would have to make the Vertex class Writable. I
>> would
>> >> > like to keep it backward compatible. Is this a possibility?
>> >> >
>> >> > Regarding run-time partitioning, not all partitioning would be based
>> on
>> >> > hash partitioning. I can have a partitioner based on color of the
>> vertex
>> >> or
>> >> > some other property of the vertex. It is a step we can skip if not
>> >> > configured by user.
>> >> >
>> >> > Just my 2 cents. We can deprecate things but let's not remove
>> >> immediately.
>> >> >
>> >> > -Suraj
>> >> >
>> >> > HAMA-632 can wait until everything is resolved. I am trying to reduce
>> the
>> >> > API complexity.
>> >> >
>> >> > On Mon, Dec 10, 2012 at 2:56 PM, Thomas Jungblut
>> >> > <thomas.jungblut@gmail.com>wrote:
>> >> >
>> >> >> You didn't get the use of the reader.
>> >> >> The reader doesn't care about the input format.
>> >> >> It just takes the input as Writable, so for Text this is
>> >> LongWritable/Text
>> >> >> pairs. For NoSQL this might be LongWritable/BytesWritable.
>> >> >>
>> >> >> It's up to you coding this for your input sequence, not for each
>> format.
>> >> >> This is not hardcoded to text, only in the examples.
>> >> >>
>> >> >> 2012/12/10 Edward J. Yoon <edwardyoon@apache.org>
>> >> >>
>> >> >> > Again ... User can create their own InputFormatter to read
records
>> as
>> >> >> > a <Writable, ArrayWritable> from text file or sequence
file, or
>> >> >> > NoSQLs.
>> >> >> >
>> >> >> > You can use K, V pairs and sequence file. Why do you want
to use
>> text
>> >> >> > file? Should I always write text file and parse them using
>> >> >> > VertexInputReader?
>> >> >> >
>> >> >> >
>> >> >> > On Tue, Dec 11, 2012 at 4:48 AM, Thomas Jungblut
>> >> >> > <thomas.jungblut@gmail.com> wrote:
>> >> >> > >>
>> >> >> > >> It's a gap in experience, Thomas.
>> >> >> > >
>> >> >> > >
>> >> >> > > Most probably you should read some good books on data
extraction
>> and
>> >> >> then
>> >> >> > > choose your tools accordingly.
>> >> >> > > I never think that BSP is and will be a good extraction
technique
>> >> for
>> >> >> > > unstructured data.
>> >> >> > >
>> >> >> > > But these are just my two cents here- there seems to
be somewhat
>> >> more
>> >> >> > > political problems in this game than using tools appropriately.
>> >> >> > >
>> >> >> > > 2012/12/10 Thomas Jungblut <thomas.jungblut@gmail.com>
>> >> >> > >
>> >> >> > >> Yes, if you preprocess your data correctly.
>> >> >> > >> I have done the same unstructured extraction with
the movie
>> >> database
>> >> >> > from
>> >> >> > >> IMDB and it worked fine.
>> >> >> > >> That's just not a job for BSP, but for MapReduce.
>> >> >> > >>
>> >> >> > >> 2012/12/10 Edward J. Yoon <edwardyoon@apache.org>
>> >> >> > >>
>> >> >> > >>> It's a gap in experience, Thomas. Do you think
you can extract
>> >> >> Twitter
>> >> >> > >>>
>> >> >> > >>> mention graph using parseVertex?
>> >> >> > >>>
>> >> >> > >>> On Tue, Dec 11, 2012 at 4:34 AM, Thomas Jungblut
>> >> >> > >>> <thomas.jungblut@gmail.com> wrote:
>> >> >> > >>> > I have trouble understanding you here.
>> >> >> > >>> >
>> >> >> > >>> > How can I generate large sample without
coding?
>> >> >> > >>> >
>> >> >> > >>> >
>> >> >> > >>> > Do you mean random data generation or real-life
data?
>> >> >> > >>> > Personally I think it is really convenient
to transform
>> >> >> unstructured
>> >> >> > >>> data
>> >> >> > >>> > in a text file to vertices.
>> >> >> > >>> >
>> >> >> > >>> >
>> >> >> > >>> > 2012/12/10 Edward <edward@udanax.org>
>> >> >> > >>> >
>> >> >> > >>> >> I mean, With or without input reader.
How can I generate
>> large
>> >> >> > sample
>> >> >> > >>> >> without coding?
>> >> >> > >>> >>
>> >> >> > >>> >> It's unnecessary feature. As I mentioned
before, only good
>> for
>> >> >> > simple
>> >> >> > >>> and
>> >> >> > >>> >> small test.
>> >> >> > >>> >>
>> >> >> > >>> >> Sent from my iPhone
>> >> >> > >>> >>
>> >> >> > >>> >> On Dec 11, 2012, at 3:38 AM, Thomas
Jungblut <
>> >> >> > >>> thomas.jungblut@gmail.com>
>> >> >> > >>> >> wrote:
>> >> >> > >>> >>
>> >> >> > >>> >> >>
>> >> >> > >>> >> >> In my case, generating test
data is very annoying.
>> >> >> > >>> >> >
>> >> >> > >>> >> >
>> >> >> > >>> >> > Really? What is so difficult to
generate tab separated
>> text
>> >> >> > data?;)
>> >> >> > >>> >> > I think we shouldn't do this, but
there seems to be very
>> >> little
>> >> >> > >>> interest
>> >> >> > >>> >> in
>> >> >> > >>> >> > the community so I will not block
your work on it.
>> >> >> > >>> >> >
>> >> >> > >>> >> > Good luck ;)
>> >> >> > >>> >>
>> >> >> > >>>
>> >> >> > >>>
>> >> >> > >>>
>> >> >> > >>> --
>> >> >> > >>> Best Regards, Edward J. Yoon
>> >> >> > >>> @eddieyoon
>> >> >> > >>>
>> >> >> > >>
>> >> >> > >>
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Best Regards, Edward J. Yoon
>> >> >> > @eddieyoon
>> >> >> >
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Best Regards, Edward J. Yoon
>> >> @eddieyoon
>> >>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Mime
View raw message