Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B192910003 for ; Fri, 7 Feb 2014 11:22:18 +0000 (UTC) Received: (qmail 32180 invoked by uid 500); 7 Feb 2014 11:22:18 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 31793 invoked by uid 500); 7 Feb 2014 11:22:17 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 31783 invoked by uid 99); 7 Feb 2014 11:22:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Feb 2014 11:22:16 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_REMOTE_IMAGE X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of claudio.martella@gmail.com designates 209.85.216.41 as permitted sender) Received: from [209.85.216.41] (HELO mail-qa0-f41.google.com) (209.85.216.41) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Feb 2014 11:22:12 +0000 Received: by mail-qa0-f41.google.com with SMTP id w8so4963759qac.14 for ; Fri, 07 Feb 2014 03:21:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=cyZY1Pc9qku8PLu09pvZfak5hs2ZwfRPtaEVGOOsofs=; b=IigtECYhNi7IDY+Xr8KaPlnV4W0cyPWKB9jdlEGxji04UT9z/1m0jggeyvSl5QD8V3 6ohyEL7q4DowQZvmB5Wb8mr2SI0eEQEZxOJRtLB72/izgVkyvTo+5sNIay1vV91kmZGB V+K3QCUakfjQ82SyfsSKGBgPR+HVJYUhc925FFV3deR4QpD8XaR+kVm1NmFUtZ7O2SVz W6vacvNrQqJ3NjQwLPqRD/S9K8dG+JM1y4g3kaJifBg1KtG0GTvpmUSoO2CnsgOgdcGC 2d1JUTo4xIt3oItMEMaQiLHYIPt/p7WFJvM7Ke42ELmOp7zeJoQeDPNUay+3KJFfgDWg 0Kmg== X-Received: by 10.229.188.193 with SMTP id db1mr16194613qcb.0.1391772110934; Fri, 07 Feb 2014 03:21:50 -0800 (PST) MIME-Version: 1.0 Received: by 10.140.48.175 with HTTP; Fri, 7 Feb 2014 03:21:30 -0800 (PST) In-Reply-To: References: From: Claudio Martella Date: Fri, 7 Feb 2014 12:21:30 +0100 Message-ID: Subject: Re: Basic questions about Giraph internals To: "user@giraph.apache.org" Content-Type: multipart/alternative; boundary=001a11343fbc47d88a04f1cf3254 X-Virus-Checked: Checked by ClamAV on apache.org --001a11343fbc47d88a04f1cf3254 Content-Type: text/plain; charset=ISO-8859-1 Yes, I think this is the best setup if you have control over your cluster. And yes, I have already tried that. On Fri, Feb 7, 2014 at 11:39 AM, Sundara Raghavan Sankaran < sundar@crayondata.com> wrote: > > On Fri, Feb 7, 2014 at 4:00 PM, Claudio Martella < > claudio.martella@gmail.com> wrote: > >> >> >> >> On Fri, Feb 7, 2014 at 9:44 AM, Alexander Frolov < >> alexndr.frolov@gmail.com> wrote: >> >>> Thank you, I will try to do this. As I understood I should set number >>>> of threads manually through Giraph API. >>>> >>>> BTW, what is conceptual difference between running multiple workers on >>>> the TaskTracker and running single worker and multiple threads? In terms of >>>> vertex fetching, memory sharing etc. >>>> >>> >> Basically, better usage of resources: one single JVM, no duplication of >> core data structures, less netty threads and communication points, more >> locality (less messages over the network), less actors accessing zookeeper >> etc. >> > > So, is it better to have one worker per machine with the number of threads > as per the core of the machines? Suppose if I have 8 machines with 6 cores > each, then instead of running 47 Workers (1 thread per Worker) + 1 Master, > it's better to run 8 Workers (6 threads per Worker) + 1 Master? Have you > tried this already? > > >> >>> >>>> Also I would like to ask how message transfer between vertices is >>> implemented in terms of Hadoop primitives? Source code reference will be >>> enough. >>> >> >> Communication does not happen via Hadoop primitives, but ad-hoc via >> netty. >> >> >> >> -- >> Claudio Martella >> >> > > -- > *Sundara Raghavan Sankaran* > > ------------------------------ > > > www.crayondata.com > > > www.bigdata-madesimple.com > ------------------------------ > > Finalist at > the Code_N 2014 Contest at CEBIT, > Hanover - the only big data company from Asia. > > > This email and its contents are confidential, and meant only for you. > Views or opinions, presented in this email, are solely of the author and > may not necessarily represent Crayon Data. > -- Claudio Martella --001a11343fbc47d88a04f1cf3254 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Yes, I think this is the best setup if you have control ov= er your cluster. And yes, I have already tried that.


On Fri, Feb 7, 2014 at 11:39 A= M, Sundara Raghavan Sankaran <sundar@crayondata.com> wro= te:
=
On Fri, Feb 7, 2014 at 4:0= 0 PM, Claudio Martella <claudio.martella@gmail.com>= wrote:



On Fri, Feb 7, 2014 at 9:44 AM,= Alexander Frolov <alexndr.frolov@gmail.com> wrote:
=
Thank you, I will try to do this. As I understood I should set nu= mber of threads manually through Giraph API.=A0

BTW, what is conceptual difference between running multiple workers= on the TaskTracker and running single worker and multiple threads? In term= s of vertex fetching, memory sharing etc.=A0
Basically, better usage of resources: one single JVM, no= duplication of core data structures, less netty threads and communication = points, more locality (less messages over the network), less actors accessi= ng zookeeper etc.

So, is it better t= o have one worker per machine with the number of threads as per the core of= the machines? Suppose if I have 8 machines with 6 cores each, then instead= of running 47 Workers (1 thread per Worker) + 1 Master, it's better to= run 8 Workers (6 threads per Worker) + 1 Master? Have you tried this alrea= dy?

=A0

=A0Als= o I would like to ask how message transfer between vertices is implemented = in terms of Hadoop primitives? Source code reference will be enough.

Communication does= not happen via Hadoop primitives, but ad-hoc via netty.=A0
<= br>

--
=A0 =A0Claudio Ma= rtella
=A0 =A0 =A0

--
Sundara Raghavan Sankaran=


=A0=A0=A0=A0=A0=A0 www.crayondata.com


www.bigdata-madesimple.com


=A0Finalist=A0at the=A0Code_N 2014 Contest=A0at=A0CEBIT, Hanover - the only b= ig data company from Asia.=A0


This email and its con= tents are confidential, and meant only for you. Views or opinions, presente= d in this email, are solely of the author and may not necessarily represent= Crayon Data.




--
=A0 =A0Claudio Martella
=A0 =A0
--001a11343fbc47d88a04f1cf3254--