Return-Path: X-Original-To: apmail-hama-dev-archive@www.apache.org Delivered-To: apmail-hama-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EBA19D921 for ; Mon, 10 Dec 2012 21:59:57 +0000 (UTC) Received: (qmail 35991 invoked by uid 500); 10 Dec 2012 21:59:57 -0000 Delivered-To: apmail-hama-dev-archive@hama.apache.org Received: (qmail 35943 invoked by uid 500); 10 Dec 2012 21:59:57 -0000 Mailing-List: contact dev-help@hama.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hama.apache.org Delivered-To: mailing list dev@hama.apache.org Received: (qmail 35934 invoked by uid 99); 10 Dec 2012 21:59:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Dec 2012 21:59:57 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of thomas.jungblut@gmail.com designates 209.85.216.54 as permitted sender) Received: from [209.85.216.54] (HELO mail-qa0-f54.google.com) (209.85.216.54) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Dec 2012 21:59:52 +0000 Received: by mail-qa0-f54.google.com with SMTP id j15so2419353qaq.13 for ; Mon, 10 Dec 2012 13:59:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=noUFcv6IqXbwhc6TmV32zFg/H/TCimbzmomAg3yeI+4=; b=Kzni80UYyKXUU6iGb0oG9TeS9fCxl9oqVbG+bcCEghkfIXQxkhP2Ao2I8RW0Q5yPpw Ur7kShTQQTEwuqWiC7EKv5W183oi7ob4AnFdx8b9Az4D2KfUtuAtXgahbSeGa54btHPT bPh4RyLEHPAvTOFAFYwrTiTZXGmOWzebzgJIHMQ8XNInTVZ8963sUgm2QcO1Yo7wsey9 Z2scNyT7QHYLE9tYab5ryhZMor74HccI63L/plgEJp1Z/YaM6P3cZNs8qNqhoFqO57l8 TolLWbb6Cj9CeOB+TQx1bzhmyrBMpvN8zzj7x+4uu0+hngUxMUhf5xon1yxNoUxekFhp CuwA== MIME-Version: 1.0 Received: by 10.49.64.234 with SMTP id r10mr34074456qes.24.1355176771470; Mon, 10 Dec 2012 13:59:31 -0800 (PST) Received: by 10.49.1.2 with HTTP; Mon, 10 Dec 2012 13:59:31 -0800 (PST) In-Reply-To: References: <8C476B05-DB98-4F1A-B2BA-495A3F48723A@udanax.org> Date: Mon, 10 Dec 2012 22:59:31 +0100 Message-ID: Subject: Re: runtimePartitioning in GraphJobRunner From: Thomas Jungblut To: dev@hama.apache.org Content-Type: multipart/alternative; boundary=047d7bdc823612436604d086ae20 X-Virus-Checked: Checked by ClamAV on apache.org --047d7bdc823612436604d086ae20 Content-Type: text/plain; charset=ISO-8859-1 Please do me a favor a code how you want the partitioning BSP job to work before removing everything. I will tell you how to use the readers without any graph duplicate code so you don't need to touch the examples at all. 2012/12/10 Edward J. Yoon > Please review > https://issues.apache.org/jira/secure/attachment/12560155/patch_v02.txt > first. > > * If we have VertexInputReader again, we don't need to apply it to all > examples. And, random generators and examples should be managed > together now. > > On Tue, Dec 11, 2012 at 6:52 AM, Thomas Jungblut > wrote: > > Yes, but in patches and in Issue Hama-531, so we can review. > > > > 2012/12/10 Edward J. Yoon > > > >> We talked on gtalk, the conclusion is as below: > >> > >> "If there's no opinion, I'll remove VertexInputReader in > >> GraphJobRunner, because it make code complex. Let's consider again > >> about the VertexInputReader, after fixing HAMA-531 and HAMA-632 > >> issues." > >> > >> I'll clean up them tomorrow. > >> > >> On Tue, Dec 11, 2012 at 4:58 AM, Suraj Menon > >> wrote: > >> > Hi Edward, I am assuming that you want to do this because you want to > run > >> > the job using more BSP tasks in parallel to reduce the memory usage > per > >> > task and perhaps run it faster. > >> > Am I right? I am +1 if this makes things faster. However this would be > >> > expensive for people with smaller clusters, and we should have spill, > >> cache > >> > and lookup implemented for Vertices in such cases. > >> > > >> > Regarding backward compatibility, can we use the user's > VertexInputReader > >> > to read the data and then write them in sequential file format we > wan't. > >> I > >> > was discussing this with Thomas and we felt this could be done by > >> > configuring a default input reader and overriding the same by > >> > configuration. We would have to make the Vertex class Writable. I > would > >> > like to keep it backward compatible. Is this a possibility? > >> > > >> > Regarding run-time partitioning, not all partitioning would be based > on > >> > hash partitioning. I can have a partitioner based on color of the > vertex > >> or > >> > some other property of the vertex. It is a step we can skip if not > >> > configured by user. > >> > > >> > Just my 2 cents. We can deprecate things but let's not remove > >> immediately. > >> > > >> > -Suraj > >> > > >> > HAMA-632 can wait until everything is resolved. I am trying to reduce > the > >> > API complexity. > >> > > >> > On Mon, Dec 10, 2012 at 2:56 PM, Thomas Jungblut > >> > wrote: > >> > > >> >> You didn't get the use of the reader. > >> >> The reader doesn't care about the input format. > >> >> It just takes the input as Writable, so for Text this is > >> LongWritable/Text > >> >> pairs. For NoSQL this might be LongWritable/BytesWritable. > >> >> > >> >> It's up to you coding this for your input sequence, not for each > format. > >> >> This is not hardcoded to text, only in the examples. > >> >> > >> >> 2012/12/10 Edward J. Yoon > >> >> > >> >> > Again ... User can create their own InputFormatter to read records > as > >> >> > a from text file or sequence file, or > >> >> > NoSQLs. > >> >> > > >> >> > You can use K, V pairs and sequence file. Why do you want to use > text > >> >> > file? Should I always write text file and parse them using > >> >> > VertexInputReader? > >> >> > > >> >> > > >> >> > On Tue, Dec 11, 2012 at 4:48 AM, Thomas Jungblut > >> >> > wrote: > >> >> > >> > >> >> > >> It's a gap in experience, Thomas. > >> >> > > > >> >> > > > >> >> > > Most probably you should read some good books on data extraction > and > >> >> then > >> >> > > choose your tools accordingly. > >> >> > > I never think that BSP is and will be a good extraction technique > >> for > >> >> > > unstructured data. > >> >> > > > >> >> > > But these are just my two cents here- there seems to be somewhat > >> more > >> >> > > political problems in this game than using tools appropriately. > >> >> > > > >> >> > > 2012/12/10 Thomas Jungblut > >> >> > > > >> >> > >> Yes, if you preprocess your data correctly. > >> >> > >> I have done the same unstructured extraction with the movie > >> database > >> >> > from > >> >> > >> IMDB and it worked fine. > >> >> > >> That's just not a job for BSP, but for MapReduce. > >> >> > >> > >> >> > >> 2012/12/10 Edward J. Yoon > >> >> > >> > >> >> > >>> It's a gap in experience, Thomas. Do you think you can extract > >> >> Twitter > >> >> > >>> > >> >> > >>> mention graph using parseVertex? > >> >> > >>> > >> >> > >>> On Tue, Dec 11, 2012 at 4:34 AM, Thomas Jungblut > >> >> > >>> wrote: > >> >> > >>> > I have trouble understanding you here. > >> >> > >>> > > >> >> > >>> > How can I generate large sample without coding? > >> >> > >>> > > >> >> > >>> > > >> >> > >>> > Do you mean random data generation or real-life data? > >> >> > >>> > Personally I think it is really convenient to transform > >> >> unstructured > >> >> > >>> data > >> >> > >>> > in a text file to vertices. > >> >> > >>> > > >> >> > >>> > > >> >> > >>> > 2012/12/10 Edward > >> >> > >>> > > >> >> > >>> >> I mean, With or without input reader. How can I generate > large > >> >> > sample > >> >> > >>> >> without coding? > >> >> > >>> >> > >> >> > >>> >> It's unnecessary feature. As I mentioned before, only good > for > >> >> > simple > >> >> > >>> and > >> >> > >>> >> small test. > >> >> > >>> >> > >> >> > >>> >> Sent from my iPhone > >> >> > >>> >> > >> >> > >>> >> On Dec 11, 2012, at 3:38 AM, Thomas Jungblut < > >> >> > >>> thomas.jungblut@gmail.com> > >> >> > >>> >> wrote: > >> >> > >>> >> > >> >> > >>> >> >> > >> >> > >>> >> >> In my case, generating test data is very annoying. > >> >> > >>> >> > > >> >> > >>> >> > > >> >> > >>> >> > Really? What is so difficult to generate tab separated > text > >> >> > data?;) > >> >> > >>> >> > I think we shouldn't do this, but there seems to be very > >> little > >> >> > >>> interest > >> >> > >>> >> in > >> >> > >>> >> > the community so I will not block your work on it. > >> >> > >>> >> > > >> >> > >>> >> > Good luck ;) > >> >> > >>> >> > >> >> > >>> > >> >> > >>> > >> >> > >>> > >> >> > >>> -- > >> >> > >>> Best Regards, Edward J. Yoon > >> >> > >>> @eddieyoon > >> >> > >>> > >> >> > >> > >> >> > >> > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > Best Regards, Edward J. Yoon > >> >> > @eddieyoon > >> >> > > >> >> > >> > >> > >> > >> -- > >> Best Regards, Edward J. Yoon > >> @eddieyoon > >> > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon > --047d7bdc823612436604d086ae20--