Return-Path: X-Original-To: apmail-hama-dev-archive@www.apache.org Delivered-To: apmail-hama-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 00D74DC5B for ; Mon, 10 Dec 2012 22:21:43 +0000 (UTC) Received: (qmail 80816 invoked by uid 500); 10 Dec 2012 22:21:42 -0000 Delivered-To: apmail-hama-dev-archive@hama.apache.org Received: (qmail 80799 invoked by uid 500); 10 Dec 2012 22:21:42 -0000 Mailing-List: contact dev-help@hama.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hama.apache.org Delivered-To: mailing list dev@hama.apache.org Received: (qmail 80791 invoked by uid 99); 10 Dec 2012 22:21:42 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Dec 2012 22:21:42 +0000 Received: from localhost (HELO mail-wi0-f171.google.com) (127.0.0.1) (smtp-auth username edwardyoon, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Dec 2012 22:21:42 +0000 Received: by mail-wi0-f171.google.com with SMTP id hn14so1475696wib.4 for ; Mon, 10 Dec 2012 14:21:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=A120bZGR4OGzSZzq+tI63AF1FYTao2Bv3iIUlC6YkmM=; b=Hl/bNlyqlxwsM4q8RxKQyL8M5/khToZE79L8mwfnVuNKv0xTJB+1eb18Kcb9iV7Qmq kgNzCOJ+wsjGZ7P8nOx2DGfd9s53w5q+VsxLm4Pcmw+oH4Y+fzCWVpq+KnyNsMJKuFFC YpI1a+eDUxc2yTZmyzdXSF333zxbHX/lqxCKq8tkeibU1BikFplvUaZrVVZEziJV2m/z ANdaURZ92xssToiJIKLVvYCV+TQOEjSlwnYF1Brx8C54O0Ypw+ktXobHPaJYjetfEYuo gx8TWHdaFcX1Qm06D1pVXZ8a8JXJgzqbebRbmGhrZVpN8vSjRVlctycmsSXbzCiCgJEo zXbw== MIME-Version: 1.0 Received: by 10.216.140.19 with SMTP id d19mr6339445wej.34.1355178100566; Mon, 10 Dec 2012 14:21:40 -0800 (PST) Received: by 10.180.103.201 with HTTP; Mon, 10 Dec 2012 14:21:40 -0800 (PST) In-Reply-To: References: <8C476B05-DB98-4F1A-B2BA-495A3F48723A@udanax.org> Date: Tue, 11 Dec 2012 07:21:40 +0900 Message-ID: Subject: Re: runtimePartitioning in GraphJobRunner From: "Edward J. Yoon" To: dev@hama.apache.org Content-Type: text/plain; charset=UTF-8 X-Gm-Message-State: ALoCoQk55Vx7cFUyIj++oY6RNOXT182KMNcUjpZzox3Hz0IcN440bKmz1h0t+6tSiSiY2TNjl9/n > Please do me a favor a code how you want the partitioning BSP job to work > before removing everything. I will tell you how to use the readers without > any graph duplicate code so you don't need to touch the examples at all. You don't need to wait. Because it will be almost same with BSPJobClient.partition() method. On Tue, Dec 11, 2012 at 6:59 AM, Thomas Jungblut wrote: > Please do me a favor a code how you want the partitioning BSP job to work > before removing everything. I will tell you how to use the readers without > any graph duplicate code so you don't need to touch the examples at all. > > 2012/12/10 Edward J. Yoon > >> Please review >> https://issues.apache.org/jira/secure/attachment/12560155/patch_v02.txt >> first. >> >> * If we have VertexInputReader again, we don't need to apply it to all >> examples. And, random generators and examples should be managed >> together now. >> >> On Tue, Dec 11, 2012 at 6:52 AM, Thomas Jungblut >> wrote: >> > Yes, but in patches and in Issue Hama-531, so we can review. >> > >> > 2012/12/10 Edward J. Yoon >> > >> >> We talked on gtalk, the conclusion is as below: >> >> >> >> "If there's no opinion, I'll remove VertexInputReader in >> >> GraphJobRunner, because it make code complex. Let's consider again >> >> about the VertexInputReader, after fixing HAMA-531 and HAMA-632 >> >> issues." >> >> >> >> I'll clean up them tomorrow. >> >> >> >> On Tue, Dec 11, 2012 at 4:58 AM, Suraj Menon >> >> wrote: >> >> > Hi Edward, I am assuming that you want to do this because you want to >> run >> >> > the job using more BSP tasks in parallel to reduce the memory usage >> per >> >> > task and perhaps run it faster. >> >> > Am I right? I am +1 if this makes things faster. However this would be >> >> > expensive for people with smaller clusters, and we should have spill, >> >> cache >> >> > and lookup implemented for Vertices in such cases. >> >> > >> >> > Regarding backward compatibility, can we use the user's >> VertexInputReader >> >> > to read the data and then write them in sequential file format we >> wan't. >> >> I >> >> > was discussing this with Thomas and we felt this could be done by >> >> > configuring a default input reader and overriding the same by >> >> > configuration. We would have to make the Vertex class Writable. I >> would >> >> > like to keep it backward compatible. Is this a possibility? >> >> > >> >> > Regarding run-time partitioning, not all partitioning would be based >> on >> >> > hash partitioning. I can have a partitioner based on color of the >> vertex >> >> or >> >> > some other property of the vertex. It is a step we can skip if not >> >> > configured by user. >> >> > >> >> > Just my 2 cents. We can deprecate things but let's not remove >> >> immediately. >> >> > >> >> > -Suraj >> >> > >> >> > HAMA-632 can wait until everything is resolved. I am trying to reduce >> the >> >> > API complexity. >> >> > >> >> > On Mon, Dec 10, 2012 at 2:56 PM, Thomas Jungblut >> >> > wrote: >> >> > >> >> >> You didn't get the use of the reader. >> >> >> The reader doesn't care about the input format. >> >> >> It just takes the input as Writable, so for Text this is >> >> LongWritable/Text >> >> >> pairs. For NoSQL this might be LongWritable/BytesWritable. >> >> >> >> >> >> It's up to you coding this for your input sequence, not for each >> format. >> >> >> This is not hardcoded to text, only in the examples. >> >> >> >> >> >> 2012/12/10 Edward J. Yoon >> >> >> >> >> >> > Again ... User can create their own InputFormatter to read records >> as >> >> >> > a from text file or sequence file, or >> >> >> > NoSQLs. >> >> >> > >> >> >> > You can use K, V pairs and sequence file. Why do you want to use >> text >> >> >> > file? Should I always write text file and parse them using >> >> >> > VertexInputReader? >> >> >> > >> >> >> > >> >> >> > On Tue, Dec 11, 2012 at 4:48 AM, Thomas Jungblut >> >> >> > wrote: >> >> >> > >> >> >> >> > >> It's a gap in experience, Thomas. >> >> >> > > >> >> >> > > >> >> >> > > Most probably you should read some good books on data extraction >> and >> >> >> then >> >> >> > > choose your tools accordingly. >> >> >> > > I never think that BSP is and will be a good extraction technique >> >> for >> >> >> > > unstructured data. >> >> >> > > >> >> >> > > But these are just my two cents here- there seems to be somewhat >> >> more >> >> >> > > political problems in this game than using tools appropriately. >> >> >> > > >> >> >> > > 2012/12/10 Thomas Jungblut >> >> >> > > >> >> >> > >> Yes, if you preprocess your data correctly. >> >> >> > >> I have done the same unstructured extraction with the movie >> >> database >> >> >> > from >> >> >> > >> IMDB and it worked fine. >> >> >> > >> That's just not a job for BSP, but for MapReduce. >> >> >> > >> >> >> >> > >> 2012/12/10 Edward J. Yoon >> >> >> > >> >> >> >> > >>> It's a gap in experience, Thomas. Do you think you can extract >> >> >> Twitter >> >> >> > >>> >> >> >> > >>> mention graph using parseVertex? >> >> >> > >>> >> >> >> > >>> On Tue, Dec 11, 2012 at 4:34 AM, Thomas Jungblut >> >> >> > >>> wrote: >> >> >> > >>> > I have trouble understanding you here. >> >> >> > >>> > >> >> >> > >>> > How can I generate large sample without coding? >> >> >> > >>> > >> >> >> > >>> > >> >> >> > >>> > Do you mean random data generation or real-life data? >> >> >> > >>> > Personally I think it is really convenient to transform >> >> >> unstructured >> >> >> > >>> data >> >> >> > >>> > in a text file to vertices. >> >> >> > >>> > >> >> >> > >>> > >> >> >> > >>> > 2012/12/10 Edward >> >> >> > >>> > >> >> >> > >>> >> I mean, With or without input reader. How can I generate >> large >> >> >> > sample >> >> >> > >>> >> without coding? >> >> >> > >>> >> >> >> >> > >>> >> It's unnecessary feature. As I mentioned before, only good >> for >> >> >> > simple >> >> >> > >>> and >> >> >> > >>> >> small test. >> >> >> > >>> >> >> >> >> > >>> >> Sent from my iPhone >> >> >> > >>> >> >> >> >> > >>> >> On Dec 11, 2012, at 3:38 AM, Thomas Jungblut < >> >> >> > >>> thomas.jungblut@gmail.com> >> >> >> > >>> >> wrote: >> >> >> > >>> >> >> >> >> > >>> >> >> >> >> >> > >>> >> >> In my case, generating test data is very annoying. >> >> >> > >>> >> > >> >> >> > >>> >> > >> >> >> > >>> >> > Really? What is so difficult to generate tab separated >> text >> >> >> > data?;) >> >> >> > >>> >> > I think we shouldn't do this, but there seems to be very >> >> little >> >> >> > >>> interest >> >> >> > >>> >> in >> >> >> > >>> >> > the community so I will not block your work on it. >> >> >> > >>> >> > >> >> >> > >>> >> > Good luck ;) >> >> >> > >>> >> >> >> >> > >>> >> >> >> > >>> >> >> >> > >>> >> >> >> > >>> -- >> >> >> > >>> Best Regards, Edward J. Yoon >> >> >> > >>> @eddieyoon >> >> >> > >>> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> > >> >> >> > >> >> >> > -- >> >> >> > Best Regards, Edward J. Yoon >> >> >> > @eddieyoon >> >> >> > >> >> >> >> >> >> >> >> >> >> >> -- >> >> Best Regards, Edward J. Yoon >> >> @eddieyoon >> >> >> >> >> >> -- >> Best Regards, Edward J. Yoon >> @eddieyoon >> -- Best Regards, Edward J. Yoon @eddieyoon