Return-Path: X-Original-To: apmail-hama-user-archive@www.apache.org Delivered-To: apmail-hama-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A06EBDF66 for ; Wed, 5 Dec 2012 12:57:14 +0000 (UTC) Received: (qmail 20328 invoked by uid 500); 5 Dec 2012 12:57:14 -0000 Delivered-To: apmail-hama-user-archive@hama.apache.org Received: (qmail 19702 invoked by uid 500); 5 Dec 2012 12:57:08 -0000 Mailing-List: contact user-help@hama.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hama.apache.org Delivered-To: mailing list user@hama.apache.org Received: (qmail 19618 invoked by uid 99); 5 Dec 2012 12:57:05 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Dec 2012 12:57:05 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of thomas.jungblut@gmail.com designates 209.85.216.175 as permitted sender) Received: from [209.85.216.175] (HELO mail-qc0-f175.google.com) (209.85.216.175) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Dec 2012 12:56:58 +0000 Received: by mail-qc0-f175.google.com with SMTP id j3so2848684qcs.34 for ; Wed, 05 Dec 2012 04:56:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=WKvQQFGfy6BaFdoCDJ6v3vdTWuzNpTnQUs1mbZKiM/k=; b=AUUvCR3s9hEdU5uXpR8WiQqmeThd8zWBYnAAjueSj97nr9tev8oudDPYhX3QDyj9aK gy1uvCm1H6ia1m1AdA6L8u/O437MjisNrR2pEYbM4+IxlBk44S0EookAAAxWMrEWtq2y RgnTXQNKT4lfgmrOoGaClZJ0vhKlMZJxCaVfOIT5eyyTkc9fgr/uzDxlij+e/2TdCl0T wShOvivBUBXPdnVYw8/yGhlX7iv4F09Dy0k7DHIh7lRo+ZBgxDfJr8strzoma0TFY5ld 1IM5c8LF7h5AhIrtiVSdo8k0gyyY6M+621lPF06ZCj+wvDe3CcUGD2SYPzaY0s/eK0/A UVzw== MIME-Version: 1.0 Received: by 10.224.187.69 with SMTP id cv5mr13792053qab.30.1354712197867; Wed, 05 Dec 2012 04:56:37 -0800 (PST) Received: by 10.49.1.2 with HTTP; Wed, 5 Dec 2012 04:56:37 -0800 (PST) In-Reply-To: References: <62685C8E-DFCE-43D2-AFEB-0288CCCB2CDB@disi.unitn.it> <939E5043-1B89-4ED1-B38A-755432CB34FC@disi.unitn.it> Date: Wed, 5 Dec 2012 13:56:37 +0100 Message-ID: Subject: Re: Partitioning From: Thomas Jungblut To: user@hama.apache.org Content-Type: multipart/alternative; boundary=485b397dd4df53a94204d01a833b X-Virus-Checked: Checked by ClamAV on apache.org --485b397dd4df53a94204d01a833b Content-Type: text/plain; charset=ISO-8859-1 Exactly, maybe you want to first read up all the different modes and how they are configured: http://wiki.apache.org/hama/GettingStarted#Modes We also have some nice documentations as PDF which you can get here: http://wiki.apache.org/hama/GettingStarted#Hama_0.6.0 The configuration property to change the number of tasks on every host is "bsp.tasks.maximum" which is described by ">The maximum number of BSP tasks that will be run simultaneously by a groom server.". Setting this to 1 on every host where a groom server starts, and afterwards restarting your cluster should do what you want to archieve. I can recommend puppet for maintaining these kinds of configurations. If you need a more formal complexity model for BSP applications let me know, I have derived one from Rob Bisseling's BSP model that fits better to Apache Hama's style of computation. 2012/12/5 Benedikt Elser > Ah, local mode, Bingo! > > About the communication costs: Yes I am aware of these, however this is > exactly what I want to test in the first place :) Hence I would need a > bsp.distributed.tasks.maximum > > Thanks for the clarifications, > > Benedikt > > On Dec 5, 2012, at 12:05 PM, Thomas Jungblut wrote: > > > Because the property is called "local". This doesn't affect the > distributed > > mode. > > Note that it is really bad if you compute multiple tasks on different > host > > machines, because this leverages your communication costs. > > > > 2012/12/5 Benedikt Elser > > > >> Thank you, I will try that. However if I set bsp.local.tasks.maximum to > 1, > >> why doesn't it distribute one task to each machine? > >> > >> On Dec 5, 2012, at 11:58 AM, Thomas Jungblut wrote: > >> > >>> So it will spawn 12 tasks. If this doesn't satisfy the load on your > >>> machines, try to use smaller blocksizes. > >>> > >>> 2012/12/5 Benedikt Elser > >>> > >>>> Hi, > >>>> > >>>> thanks for your reply! > >>>> > >>>> Total size: 49078776 B > >>>> Total dirs: 1 > >>>> Total files: 12 > >>>> Total blocks (validated): 12 (avg. block size 4089898 B) > >>>> > >>>> Benedikt > >>>> > >>>> On Dec 5, 2012, at 11:47 AM, Thomas Jungblut wrote: > >>>> > >>>>> So how many blocks has your data in HDFS? > >>>>> > >>>>> 2012/12/5 Benedikt Elser > >>>>> > >>>>>> Hi List, > >>>>>> > >>>>>> I am using the hama-0.6.0 release to run graph jobs on various input > >>>>>> graphs in a ec2 based cluster of size 12. However as I see in the > logs > >>>> not > >>>>>> every node on the cluster contributes to that job (they have no > >>>>>> tasklog/job dir and are idle). Theoretically a distribution of 1 > >>>>>> Million nodes across 12 buckets should hit every node at least once. > >>>>>> Therefore I think its a configuration problem. So far I messed > around > >>>> with > >>>>>> these settings: > >>>>>> > >>>>>> bsp.max.tasks.per.job > >>>>>> bsp.local.tasks.maximum > >>>>>> bsp.tasks.maximum > >>>>>> bsp.child.java.opts > >>>>>> > >>>>>> Setting bsp.local.tasks.maximum to 1 and bsp.tasks.maximum.per.job > to > >> 12 > >>>>>> hat not the desired effect. I also split the input into 12 files > >>>> (because > >>>>>> of something in 0.5, that was fixed in 0.6). > >>>>>> > >>>>>> Could you recommend me some settings or guide me through the > system's > >>>>>> partition decision? I thought it would be: > >>>>>> > >>>>>> Input -> Input Split based on input, max* conf values -> number of > >> tasks > >>>>>> HashPartition.class distributes Ids across that number of tasks. > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> Benedikt > >>>> > >>>> > >> > >> > > --485b397dd4df53a94204d01a833b--