giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <...@apache.org>
Subject Re: Basic questions about Giraph internals
Date Fri, 07 Feb 2014 12:25:04 GMT
I tried the setup with one multithreaded worker per machine for the 
first time a few minutes ago on a cluster of 25 machines, and my job 
(closeness centrality estimation on a billion edge graph) ran twice as 
fast!



On 02/07/2014 12:21 PM, Claudio Martella wrote:
> Yes, I think this is the best setup if you have control over your cluster.
> And yes, I have already tried that.
>
>
> On Fri, Feb 7, 2014 at 11:39 AM, Sundara Raghavan Sankaran <
> sundar@crayondata.com> wrote:
>
>>
>> On Fri, Feb 7, 2014 at 4:00 PM, Claudio Martella <
>> claudio.martella@gmail.com> wrote:
>>
>>>
>>>
>>>
>>> On Fri, Feb 7, 2014 at 9:44 AM, Alexander Frolov <
>>> alexndr.frolov@gmail.com> wrote:
>>>
>>>>   Thank you, I will try to do this. As I understood I should set number
>>>>> of threads manually through Giraph API.
>>>>>
>>>>> BTW, what is conceptual difference between running multiple workers on
>>>>> the TaskTracker and running single worker and multiple threads? In terms
of
>>>>> vertex fetching, memory sharing etc.
>>>>>
>>>>
>>> Basically, better usage of resources: one single JVM, no duplication of
>>> core data structures, less netty threads and communication points, more
>>> locality (less messages over the network), less actors accessing zookeeper
>>> etc.
>>>
>>
>> So, is it better to have one worker per machine with the number of threads
>> as per the core of the machines? Suppose if I have 8 machines with 6 cores
>> each, then instead of running 47 Workers (1 thread per Worker) + 1 Master,
>> it's better to run 8 Workers (6 threads per Worker) + 1 Master? Have you
>> tried this already?
>>
>>
>>>
>>>>
>>>>>   Also I would like to ask how message transfer between vertices is
>>>> implemented in terms of Hadoop primitives? Source code reference will be
>>>> enough.
>>>>
>>>
>>> Communication does not happen via Hadoop primitives, but ad-hoc via
>>> netty.
>>>
>>>
>>>
>>> --
>>>     Claudio Martella
>>>
>>>
>>
>> --
>> *Sundara Raghavan Sankaran*
>>
>>   ------------------------------
>>
>> <http://crayondata.com/?utm_source=emailsig>      <https://www.facebook.com/crayondata><https://twitter.com/CrayonBigData><http://www.linkedin.com/company/crayon-data><https://plus.google.com/+Crayondata1><http://www.youtube.com/user/crayonbigdata>
>> www.crayondata.com <http://crayondata.com/?utm_source=emailsig>
>>
>> <http://bigdata-madesimple.com/?utm_source=emailsig>
>> www.bigdata-madesimple.com<http://bigdata-madesimple.com/?utm_source=emailsig>
>> ------------------------------
>>
>>   Finalist<http://www.code-n.org/fileadmin/user_upload/pdf/131210_List_Top_50_EN.pdf>
at
>> the Code_N 2014 Contest <http://www.code-n.org/cebit/award/> at CEBIT<http://www.cebit.com/>,
>> Hanover - the only big data company from Asia.
>>
>>
>> This email and its contents are confidential, and meant only for you.
>> Views or opinions, presented in this email, are solely of the author and
>> may not necessarily represent Crayon Data.
>>
>
>
>


Mime
View raw message