On Fri, Feb 7, 2014 at 4:00 PM, Claudio Martella <claudio.martella@gmail.com> wrote:



On Fri, Feb 7, 2014 at 9:44 AM, Alexander Frolov <alexndr.frolov@gmail.com> wrote:
Thank you, I will try to do this. As I understood I should set number of threads manually through Giraph API. 

BTW, what is conceptual difference between running multiple workers on the TaskTracker and running single worker and multiple threads? In terms of vertex fetching, memory sharing etc. 

Basically, better usage of resources: one single JVM, no duplication of core data structures, less netty threads and communication points, more locality (less messages over the network), less actors accessing zookeeper etc.

So, is it better to have one worker per machine with the number of threads as per the core of the machines? Suppose if I have 8 machines with 6 cores each, then instead of running 47 Workers (1 thread per Worker) + 1 Master, it's better to run 8 Workers (6 threads per Worker) + 1 Master? Have you tried this already?

 

 Also I would like to ask how message transfer between vertices is implemented in terms of Hadoop primitives? Source code reference will be enough.

Communication does not happen via Hadoop primitives, but ad-hoc via netty. 



--
   Claudio Martella
     

--
Sundara Raghavan Sankaran


      
www.crayondata.com


www.bigdata-madesimple.com


 Finalist at the Code_N 2014 Contest at CEBIT, Hanover - the only big data company from Asia. 


This email and its contents are confidential, and meant only for you. Views or opinions, presented in this email, are solely of the author and may not necessarily represent Crayon Data.