Mailing-List: contact dev-help@hama.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hama.apache.org
Received-SPF: pass (athena.apache.org: domain of thomas.jungblut@gmail.com
 designates 209.85.216.54 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAGQgZQQHqKdthCN8USqV0Rj3e4-M3OmF7denw12KybC7e+GDjQ@mail.gmail.com>
References: 
 <CAGQgZQR7sZuOR04VS5_GtCY3F6tXHY=CJQruOExyzqSaChfVDA@mail.gmail.com>
	<CAGQgZQQeYsJKbEx9ke4TWGCBg5VrZv5iUqp42M9BbE-7f3epYQ@mail.gmail.com>
	<CAGQgZQRD_YAuRiaUmvWJym1O9eOwJek6vL68v=JmFNTF1G3odg@mail.gmail.com>
	<CAJ-=ysmKWW4MUF7jwtx_ayz=dpgOfgaASchu0=ymayRdmbH4ZQ@mail.gmail.com>
	<CAGQgZQTRq7WeZzzCj-vCgKMAD5Nx1U3C07yjqXD3M1XuW79oHQ@mail.gmail.com>
	<CAJ-=ys=0s2WBdKjh6aZfx+MEUxbQbohhnfc886XL85tamkan6A@mail.gmail.com>
	<CAGQgZQSEVvk1avfWkdAVDn6n-2ewMGrqOUtB1hCaVSQWHBXVjA@mail.gmail.com>
	<CAJ-=ysnt0P5EX5KWNv2=KyF_n4+3z1TwJxB6LEPHt9VLUgy+ow@mail.gmail.com>
	<CAGQgZQRKkZk958fq_DPnJZQw0gbqpQYojaVoigy4QUzjsD-i-w@mail.gmail.com>
	<CAJ-=yskyzTfj3u7geUPwqPdi8wwOchTjScgjX3vNnJr4AZMYnA@mail.gmail.com>
	<CAGQgZQQZJ68pcWGh9YQ2urX-JpFzR5uMqvUNF3y9kn9VvqJXfA@mail.gmail.com>
	<CAJ-=ysnb9jtOBbZo2=FMrUfqp28iZURYQTaOKxYyL0c8LXVv-g@mail.gmail.com>
	<CAGQgZQTcD4FLbuAsUYR-+VB6E8R6EuZfXchFuHyzcLBnUywN1Q@mail.gmail.com>
	<CAJ-=yskLSDDMVHk8WyiSMKC_DqSAbu8PRFps0m_Odz9B7iPT1w@mail.gmail.com>
	<CAGQgZQRCWvypDFNGMfAWpfUwEPP02J5YoDgi=8G43zQo58DYRQ@mail.gmail.com>
	<CAJ-=ysn73pqdN-CSuVKPRZUMYW49MJ2=utKTr8M6TXEK8Gy7_g@mail.gmail.com>
	<8C476B05-DB98-4F1A-B2BA-495A3F48723A@udanax.org>
	<CAJ-=ys=Z-YgM5v1qa-Q8FxickZueT5L-R1McMD8FnPmrHhvLyQ@mail.gmail.com>
	<CAGQgZQSp1JGbjgBeJ0mHkPjEyxM+QGL5LmD_C1F2ozWwM8x4Aw@mail.gmail.com>
	<CAJ-=ysnX9zKBAzxthb9esbeCY1+qQCvBWQQ47sUogRKRdxHV8Q@mail.gmail.com>
	<CAJ-=ys=p_5pe3uTFDV8FzVO1LYTuSZuNEUkisGczdPoyShghkw@mail.gmail.com>
	<CAGQgZQSji4xe7XKm0CjA733DRjPmthRNZuJp_XLiTByk-OMy8g@mail.gmail.com>
	<CAJ-=ysnoWz0vCEOxAZj3b=eA8H_Yipjc1DNHv42z4fPF1zrgFA@mail.gmail.com>
	<CALis4o5SFZ1g5cZzbq9ktF1p2rbGUqdetTV3DNE-BhH1mAhYrw@mail.gmail.com>
	<CAGQgZQQpSc0KuTrXXqsJMmDmsPnf+nq1L+usgmX_unZrUFLntA@mail.gmail.com>
	<CAJ-=ysmCxUzSpF+_3YJ0XqiicRAgebKdt-H_o76yVSaZC0kptg@mail.gmail.com>
	<CAGQgZQQHqKdthCN8USqV0Rj3e4-M3OmF7denw12KybC7e+GDjQ@mail.gmail.com>
Date: Mon, 10 Dec 2012 22:59:31 +0100
Message-ID: 
 <CAJ-=ysnsfq-cXKUNfcdTV3UZWnOVAsUx5K60WzQEmgnxq_188A@mail.gmail.com>
Subject: Re: runtimePartitioning in GraphJobRunner
From: Thomas Jungblut <thomas.jungblut@gmail.com>
To: dev@hama.apache.org
Content-Type: multipart/alternative; boundary=047d7bdc823612436604d086ae20

--047d7bdc823612436604d086ae20
Content-Type: text/plain; charset=ISO-8859-1

Please do me a favor a code how you want the partitioning BSP job to work
before removing everything. I will tell you how to use the readers without
any graph duplicate code so you don't need to touch the examples at all.

2012/12/10 Edward J. Yoon <edwardyoon@apache.org>

> Please review
> https://issues.apache.org/jira/secure/attachment/12560155/patch_v02.txt
> first.
>
> * If we have VertexInputReader again, we don't need to apply it to all
> examples. And, random generators and examples should be managed
> together now.
>
> On Tue, Dec 11, 2012 at 6:52 AM, Thomas Jungblut
> <thomas.jungblut@gmail.com> wrote:
> > Yes, but in patches and in Issue Hama-531, so we can review.
> >
> > 2012/12/10 Edward J. Yoon <edwardyoon@apache.org>
> >
> >> We talked on gtalk, the conclusion is as below:
> >>
> >> "If there's no opinion, I'll remove VertexInputReader in
> >> GraphJobRunner, because it make code complex. Let's consider again
> >> about the VertexInputReader, after fixing HAMA-531 and HAMA-632
> >> issues."
> >>
> >> I'll clean up them tomorrow.
> >>
> >> On Tue, Dec 11, 2012 at 4:58 AM, Suraj Menon <surajsmenon@apache.org>
> >> wrote:
> >> > Hi Edward, I am assuming that you want to do this because you want to
> run
> >> > the job using more BSP tasks in parallel to reduce the memory usage
> per
> >> > task and perhaps run it faster.
> >> > Am I right? I am +1 if this makes things faster. However this would be
> >> > expensive for people with smaller clusters, and we should have spill,
> >> cache
> >> > and lookup implemented for Vertices in such cases.
> >> >
> >> > Regarding backward compatibility, can we use the user's
> VertexInputReader
> >> > to read the data and then write them in sequential file format we
> wan't.
> >> I
> >> > was discussing this with Thomas and we felt this could be done by
> >> > configuring a default input reader and overriding the same by
> >> > configuration. We would have to make the Vertex class Writable. I
> would
> >> > like to keep it backward compatible. Is this a possibility?
> >> >
> >> > Regarding run-time partitioning, not all partitioning would be based
> on
> >> > hash partitioning. I can have a partitioner based on color of the
> vertex
> >> or
> >> > some other property of the vertex. It is a step we can skip if not
> >> > configured by user.
> >> >
> >> > Just my 2 cents. We can deprecate things but let's not remove
> >> immediately.
> >> >
> >> > -Suraj
> >> >
> >> > HAMA-632 can wait until everything is resolved. I am trying to reduce
> the
> >> > API complexity.
> >> >
> >> > On Mon, Dec 10, 2012 at 2:56 PM, Thomas Jungblut
> >> > <thomas.jungblut@gmail.com>wrote:
> >> >
> >> >> You didn't get the use of the reader.
> >> >> The reader doesn't care about the input format.
> >> >> It just takes the input as Writable, so for Text this is
> >> LongWritable/Text
> >> >> pairs. For NoSQL this might be LongWritable/BytesWritable.
> >> >>
> >> >> It's up to you coding this for your input sequence, not for each
> format.
> >> >> This is not hardcoded to text, only in the examples.
> >> >>
> >> >> 2012/12/10 Edward J. Yoon <edwardyoon@apache.org>
> >> >>
> >> >> > Again ... User can create their own InputFormatter to read records
> as
> >> >> > a <Writable, ArrayWritable> from text file or sequence file, or
> >> >> > NoSQLs.
> >> >> >
> >> >> > You can use K, V pairs and sequence file. Why do you want to use
> text
> >> >> > file? Should I always write text file and parse them using
> >> >> > VertexInputReader?
> >> >> >
> >> >> >
> >> >> > On Tue, Dec 11, 2012 at 4:48 AM, Thomas Jungblut
> >> >> > <thomas.jungblut@gmail.com> wrote:
> >> >> > >>
> >> >> > >> It's a gap in experience, Thomas.
> >> >> > >
> >> >> > >
> >> >> > > Most probably you should read some good books on data extraction
> and
> >> >> then
> >> >> > > choose your tools accordingly.
> >> >> > > I never think that BSP is and will be a good extraction technique
> >> for
> >> >> > > unstructured data.
> >> >> > >
> >> >> > > But these are just my two cents here- there seems to be somewhat
> >> more
> >> >> > > political problems in this game than using tools appropriately.
> >> >> > >
> >> >> > > 2012/12/10 Thomas Jungblut <thomas.jungblut@gmail.com>
> >> >> > >
> >> >> > >> Yes, if you preprocess your data correctly.
> >> >> > >> I have done the same unstructured extraction with the movie
> >> database
> >> >> > from
> >> >> > >> IMDB and it worked fine.
> >> >> > >> That's just not a job for BSP, but for MapReduce.
> >> >> > >>
> >> >> > >> 2012/12/10 Edward J. Yoon <edwardyoon@apache.org>
> >> >> > >>
> >> >> > >>> It's a gap in experience, Thomas. Do you think you can extract
> >> >> Twitter
> >> >> > >>>
> >> >> > >>> mention graph using parseVertex?
> >> >> > >>>
> >> >> > >>> On Tue, Dec 11, 2012 at 4:34 AM, Thomas Jungblut
> >> >> > >>> <thomas.jungblut@gmail.com> wrote:
> >> >> > >>> > I have trouble understanding you here.
> >> >> > >>> >
> >> >> > >>> > How can I generate large sample without coding?
> >> >> > >>> >
> >> >> > >>> >
> >> >> > >>> > Do you mean random data generation or real-life data?
> >> >> > >>> > Personally I think it is really convenient to transform
> >> >> unstructured
> >> >> > >>> data
> >> >> > >>> > in a text file to vertices.
> >> >> > >>> >
> >> >> > >>> >
> >> >> > >>> > 2012/12/10 Edward <edward@udanax.org>
> >> >> > >>> >
> >> >> > >>> >> I mean, With or without input reader. How can I generate
> large
> >> >> > sample
> >> >> > >>> >> without coding?
> >> >> > >>> >>
> >> >> > >>> >> It's unnecessary feature. As I mentioned before, only good
> for
> >> >> > simple
> >> >> > >>> and
> >> >> > >>> >> small test.
> >> >> > >>> >>
> >> >> > >>> >> Sent from my iPhone
> >> >> > >>> >>
> >> >> > >>> >> On Dec 11, 2012, at 3:38 AM, Thomas Jungblut <
> >> >> > >>> thomas.jungblut@gmail.com>
> >> >> > >>> >> wrote:
> >> >> > >>> >>
> >> >> > >>> >> >>
> >> >> > >>> >> >> In my case, generating test data is very annoying.
> >> >> > >>> >> >
> >> >> > >>> >> >
> >> >> > >>> >> > Really? What is so difficult to generate tab separated
> text
> >> >> > data?;)
> >> >> > >>> >> > I think we shouldn't do this, but there seems to be very
> >> little
> >> >> > >>> interest
> >> >> > >>> >> in
> >> >> > >>> >> > the community so I will not block your work on it.
> >> >> > >>> >> >
> >> >> > >>> >> > Good luck ;)
> >> >> > >>> >>
> >> >> > >>>
> >> >> > >>>
> >> >> > >>>
> >> >> > >>> --
> >> >> > >>> Best Regards, Edward J. Yoon
> >> >> > >>> @eddieyoon
> >> >> > >>>
> >> >> > >>
> >> >> > >>
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Best Regards, Edward J. Yoon
> >> >> > @eddieyoon
> >> >> >
> >> >>
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

--047d7bdc823612436604d086ae20--