hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: runtimePartitioning in GraphJobRunner
Date Mon, 10 Dec 2012 21:57:07 GMT
Please review https://issues.apache.org/jira/secure/attachment/12560155/patch_v02.txt
first.

* If we have VertexInputReader again, we don't need to apply it to all
examples. And, random generators and examples should be managed
together now.

On Tue, Dec 11, 2012 at 6:52 AM, Thomas Jungblut
<thomas.jungblut@gmail.com> wrote:
> Yes, but in patches and in Issue Hama-531, so we can review.
>
> 2012/12/10 Edward J. Yoon <edwardyoon@apache.org>
>
>> We talked on gtalk, the conclusion is as below:
>>
>> "If there's no opinion, I'll remove VertexInputReader in
>> GraphJobRunner, because it make code complex. Let's consider again
>> about the VertexInputReader, after fixing HAMA-531 and HAMA-632
>> issues."
>>
>> I'll clean up them tomorrow.
>>
>> On Tue, Dec 11, 2012 at 4:58 AM, Suraj Menon <surajsmenon@apache.org>
>> wrote:
>> > Hi Edward, I am assuming that you want to do this because you want to run
>> > the job using more BSP tasks in parallel to reduce the memory usage per
>> > task and perhaps run it faster.
>> > Am I right? I am +1 if this makes things faster. However this would be
>> > expensive for people with smaller clusters, and we should have spill,
>> cache
>> > and lookup implemented for Vertices in such cases.
>> >
>> > Regarding backward compatibility, can we use the user's VertexInputReader
>> > to read the data and then write them in sequential file format we wan't.
>> I
>> > was discussing this with Thomas and we felt this could be done by
>> > configuring a default input reader and overriding the same by
>> > configuration. We would have to make the Vertex class Writable. I would
>> > like to keep it backward compatible. Is this a possibility?
>> >
>> > Regarding run-time partitioning, not all partitioning would be based on
>> > hash partitioning. I can have a partitioner based on color of the vertex
>> or
>> > some other property of the vertex. It is a step we can skip if not
>> > configured by user.
>> >
>> > Just my 2 cents. We can deprecate things but let's not remove
>> immediately.
>> >
>> > -Suraj
>> >
>> > HAMA-632 can wait until everything is resolved. I am trying to reduce the
>> > API complexity.
>> >
>> > On Mon, Dec 10, 2012 at 2:56 PM, Thomas Jungblut
>> > <thomas.jungblut@gmail.com>wrote:
>> >
>> >> You didn't get the use of the reader.
>> >> The reader doesn't care about the input format.
>> >> It just takes the input as Writable, so for Text this is
>> LongWritable/Text
>> >> pairs. For NoSQL this might be LongWritable/BytesWritable.
>> >>
>> >> It's up to you coding this for your input sequence, not for each format.
>> >> This is not hardcoded to text, only in the examples.
>> >>
>> >> 2012/12/10 Edward J. Yoon <edwardyoon@apache.org>
>> >>
>> >> > Again ... User can create their own InputFormatter to read records
as
>> >> > a <Writable, ArrayWritable> from text file or sequence file,
or
>> >> > NoSQLs.
>> >> >
>> >> > You can use K, V pairs and sequence file. Why do you want to use text
>> >> > file? Should I always write text file and parse them using
>> >> > VertexInputReader?
>> >> >
>> >> >
>> >> > On Tue, Dec 11, 2012 at 4:48 AM, Thomas Jungblut
>> >> > <thomas.jungblut@gmail.com> wrote:
>> >> > >>
>> >> > >> It's a gap in experience, Thomas.
>> >> > >
>> >> > >
>> >> > > Most probably you should read some good books on data extraction
and
>> >> then
>> >> > > choose your tools accordingly.
>> >> > > I never think that BSP is and will be a good extraction technique
>> for
>> >> > > unstructured data.
>> >> > >
>> >> > > But these are just my two cents here- there seems to be somewhat
>> more
>> >> > > political problems in this game than using tools appropriately.
>> >> > >
>> >> > > 2012/12/10 Thomas Jungblut <thomas.jungblut@gmail.com>
>> >> > >
>> >> > >> Yes, if you preprocess your data correctly.
>> >> > >> I have done the same unstructured extraction with the movie
>> database
>> >> > from
>> >> > >> IMDB and it worked fine.
>> >> > >> That's just not a job for BSP, but for MapReduce.
>> >> > >>
>> >> > >> 2012/12/10 Edward J. Yoon <edwardyoon@apache.org>
>> >> > >>
>> >> > >>> It's a gap in experience, Thomas. Do you think you can
extract
>> >> Twitter
>> >> > >>>
>> >> > >>> mention graph using parseVertex?
>> >> > >>>
>> >> > >>> On Tue, Dec 11, 2012 at 4:34 AM, Thomas Jungblut
>> >> > >>> <thomas.jungblut@gmail.com> wrote:
>> >> > >>> > I have trouble understanding you here.
>> >> > >>> >
>> >> > >>> > How can I generate large sample without coding?
>> >> > >>> >
>> >> > >>> >
>> >> > >>> > Do you mean random data generation or real-life data?
>> >> > >>> > Personally I think it is really convenient to transform
>> >> unstructured
>> >> > >>> data
>> >> > >>> > in a text file to vertices.
>> >> > >>> >
>> >> > >>> >
>> >> > >>> > 2012/12/10 Edward <edward@udanax.org>
>> >> > >>> >
>> >> > >>> >> I mean, With or without input reader. How can
I generate large
>> >> > sample
>> >> > >>> >> without coding?
>> >> > >>> >>
>> >> > >>> >> It's unnecessary feature. As I mentioned before,
only good for
>> >> > simple
>> >> > >>> and
>> >> > >>> >> small test.
>> >> > >>> >>
>> >> > >>> >> Sent from my iPhone
>> >> > >>> >>
>> >> > >>> >> On Dec 11, 2012, at 3:38 AM, Thomas Jungblut
<
>> >> > >>> thomas.jungblut@gmail.com>
>> >> > >>> >> wrote:
>> >> > >>> >>
>> >> > >>> >> >>
>> >> > >>> >> >> In my case, generating test data is
very annoying.
>> >> > >>> >> >
>> >> > >>> >> >
>> >> > >>> >> > Really? What is so difficult to generate
tab separated text
>> >> > data?;)
>> >> > >>> >> > I think we shouldn't do this, but there
seems to be very
>> little
>> >> > >>> interest
>> >> > >>> >> in
>> >> > >>> >> > the community so I will not block your work
on it.
>> >> > >>> >> >
>> >> > >>> >> > Good luck ;)
>> >> > >>> >>
>> >> > >>>
>> >> > >>>
>> >> > >>>
>> >> > >>> --
>> >> > >>> Best Regards, Edward J. Yoon
>> >> > >>> @eddieyoon
>> >> > >>>
>> >> > >>
>> >> > >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Best Regards, Edward J. Yoon
>> >> > @eddieyoon
>> >> >
>> >>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Mime
View raw message