Return-Path: X-Original-To: apmail-hama-dev-archive@www.apache.org Delivered-To: apmail-hama-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9BA99DD32 for ; Mon, 10 Dec 2012 19:58:38 +0000 (UTC) Received: (qmail 86623 invoked by uid 500); 10 Dec 2012 19:58:38 -0000 Delivered-To: apmail-hama-dev-archive@hama.apache.org Received: (qmail 86599 invoked by uid 500); 10 Dec 2012 19:58:38 -0000 Mailing-List: contact dev-help@hama.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hama.apache.org Delivered-To: mailing list dev@hama.apache.org Received: (qmail 86591 invoked by uid 99); 10 Dec 2012 19:58:38 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Dec 2012 19:58:38 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of menonsuraj5@gmail.com designates 209.85.219.47 as permitted sender) Received: from [209.85.219.47] (HELO mail-oa0-f47.google.com) (209.85.219.47) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Dec 2012 19:58:31 +0000 Received: by mail-oa0-f47.google.com with SMTP id h1so3029527oag.34 for ; Mon, 10 Dec 2012 11:58:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; bh=967U4A5sBaFDeGsCijp5kOYw90B8T9i5z2KhKyuttdI=; b=iO15+a+t7I6TpEvBHZTvcxtzInvJUHdkw44j8xX/6gLC9otUQESTKlozRQ/rOAoHlE sPmxwJKmOkUCNrXRGqrM8BTbrsZWjUaaw5EzYUOMgZuWghOagsg3qBmeI27GAGzD9Ghu yLMxrwVXpHbRT7eL4vFqOntsFH0fwRBQ5/VbSjOpEm4jOX9wIQppc8BUeMPZ3qDXCwKE DJFyU+iFqNZu0h9tr0pXCZdlLddqZE59oI1EQMiNMHwA8dhcNthJaARvyspfCjAk5qph c5FqWljveM0qNBNn9WtMcqsSn3drsRckH3itTUVB23Z3eSS62JIgtyLDaCb6Rn56HlSD Vq0Q== MIME-Version: 1.0 Received: by 10.60.31.19 with SMTP id w19mr8186831oeh.3.1355169490508; Mon, 10 Dec 2012 11:58:10 -0800 (PST) Sender: menonsuraj5@gmail.com Received: by 10.76.168.230 with HTTP; Mon, 10 Dec 2012 11:58:10 -0800 (PST) In-Reply-To: References: <8C476B05-DB98-4F1A-B2BA-495A3F48723A@udanax.org> Date: Mon, 10 Dec 2012 14:58:10 -0500 X-Google-Sender-Auth: SB3BRQojWQaWCx6Y6lx4UAIV3X8 Message-ID: Subject: Re: runtimePartitioning in GraphJobRunner From: Suraj Menon To: dev@hama.apache.org Content-Type: multipart/alternative; boundary=e89a8fb1fd7e179a2304d084fc76 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8fb1fd7e179a2304d084fc76 Content-Type: text/plain; charset=ISO-8859-1 Hi Edward, I am assuming that you want to do this because you want to run the job using more BSP tasks in parallel to reduce the memory usage per task and perhaps run it faster. Am I right? I am +1 if this makes things faster. However this would be expensive for people with smaller clusters, and we should have spill, cache and lookup implemented for Vertices in such cases. Regarding backward compatibility, can we use the user's VertexInputReader to read the data and then write them in sequential file format we wan't. I was discussing this with Thomas and we felt this could be done by configuring a default input reader and overriding the same by configuration. We would have to make the Vertex class Writable. I would like to keep it backward compatible. Is this a possibility? Regarding run-time partitioning, not all partitioning would be based on hash partitioning. I can have a partitioner based on color of the vertex or some other property of the vertex. It is a step we can skip if not configured by user. Just my 2 cents. We can deprecate things but let's not remove immediately. -Suraj HAMA-632 can wait until everything is resolved. I am trying to reduce the API complexity. On Mon, Dec 10, 2012 at 2:56 PM, Thomas Jungblut wrote: > You didn't get the use of the reader. > The reader doesn't care about the input format. > It just takes the input as Writable, so for Text this is LongWritable/Text > pairs. For NoSQL this might be LongWritable/BytesWritable. > > It's up to you coding this for your input sequence, not for each format. > This is not hardcoded to text, only in the examples. > > 2012/12/10 Edward J. Yoon > > > Again ... User can create their own InputFormatter to read records as > > a from text file or sequence file, or > > NoSQLs. > > > > You can use K, V pairs and sequence file. Why do you want to use text > > file? Should I always write text file and parse them using > > VertexInputReader? > > > > > > On Tue, Dec 11, 2012 at 4:48 AM, Thomas Jungblut > > wrote: > > >> > > >> It's a gap in experience, Thomas. > > > > > > > > > Most probably you should read some good books on data extraction and > then > > > choose your tools accordingly. > > > I never think that BSP is and will be a good extraction technique for > > > unstructured data. > > > > > > But these are just my two cents here- there seems to be somewhat more > > > political problems in this game than using tools appropriately. > > > > > > 2012/12/10 Thomas Jungblut > > > > > >> Yes, if you preprocess your data correctly. > > >> I have done the same unstructured extraction with the movie database > > from > > >> IMDB and it worked fine. > > >> That's just not a job for BSP, but for MapReduce. > > >> > > >> 2012/12/10 Edward J. Yoon > > >> > > >>> It's a gap in experience, Thomas. Do you think you can extract > Twitter > > >>> > > >>> mention graph using parseVertex? > > >>> > > >>> On Tue, Dec 11, 2012 at 4:34 AM, Thomas Jungblut > > >>> wrote: > > >>> > I have trouble understanding you here. > > >>> > > > >>> > How can I generate large sample without coding? > > >>> > > > >>> > > > >>> > Do you mean random data generation or real-life data? > > >>> > Personally I think it is really convenient to transform > unstructured > > >>> data > > >>> > in a text file to vertices. > > >>> > > > >>> > > > >>> > 2012/12/10 Edward > > >>> > > > >>> >> I mean, With or without input reader. How can I generate large > > sample > > >>> >> without coding? > > >>> >> > > >>> >> It's unnecessary feature. As I mentioned before, only good for > > simple > > >>> and > > >>> >> small test. > > >>> >> > > >>> >> Sent from my iPhone > > >>> >> > > >>> >> On Dec 11, 2012, at 3:38 AM, Thomas Jungblut < > > >>> thomas.jungblut@gmail.com> > > >>> >> wrote: > > >>> >> > > >>> >> >> > > >>> >> >> In my case, generating test data is very annoying. > > >>> >> > > > >>> >> > > > >>> >> > Really? What is so difficult to generate tab separated text > > data?;) > > >>> >> > I think we shouldn't do this, but there seems to be very little > > >>> interest > > >>> >> in > > >>> >> > the community so I will not block your work on it. > > >>> >> > > > >>> >> > Good luck ;) > > >>> >> > > >>> > > >>> > > >>> > > >>> -- > > >>> Best Regards, Edward J. Yoon > > >>> @eddieyoon > > >>> > > >> > > >> > > > > > > > > -- > > Best Regards, Edward J. Yoon > > @eddieyoon > > > --e89a8fb1fd7e179a2304d084fc76--