giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From José Luis Larroque <larroques...@gmail.com>
Subject Re: giraph.numInputThreads execution time for "input superstep" it's the same using 1 or 8 threads, how this can be possible?
Date Thu, 25 Aug 2016 22:36:11 GMT
he cluster used for this was 1 master and one slave, both of a r3.8xlarge
EC2 instance on AWS.

2016-08-25 19:26 GMT-03:00 José Luis Larroque <larroquester@gmail.com>:

> I'm doing BFS search through the Wikipedia (spanish edition) site. I
> converted the [dump][1] into a file that could be read with Giraph.
>
> Using 1 worker, a file of 1 GB took 492 seconds. I executed Giraph with
> this command:
>
>     /home/hadoop/bin/yarn jar /home/hadoop/giraph/giraph.jar
> ar.edu.info.unlp.tesina.lectura.grafo.BusquedaDeCaminosNavegacionalesWikiquote
> -vif ar.edu.info.unlp.tesina.vertice.estructuras.
> IdTextWithComplexValueInputFormat -vip /user/hduser/input/grafo-wikipedia.txt
> -vof ar.edu.info.unlp.tesina.vertice.estructuras.
> IdTextWithComplexValueOutputFormat -op /user/hduser/output/caminosNavegacionales
> -w 1 -yh 120000 -ca giraph.metrics.enable=true,
> giraph.useOutOfCoreMessages=true
>
> Container logs:
>
>     16/08/24 21:17:02 INFO master.BspServiceMaster:
> generateVertexInputSplits: Got 8 input splits for 1 input threads
>     16/08/24 21:17:02 INFO master.BspServiceMaster:
> createVertexInputSplits: Starting to write input split data to zookeeper
> with 1 threads
>     16/08/24 21:17:02 INFO master.BspServiceMaster:
> createVertexInputSplits: Done writing input split data to zookeeper
>     16/08/24 21:17:02 INFO yarn.GiraphYarnTask: [STATUS: task-0]
> MASTER_ZOOKEEPER_ONLY checkWorkers: Done - Found 1 responses of 1 needed to
> start superstep -1
>     16/08/24 21:17:02 INFO netty.NettyClient: Using Netty without
> authentication.
>     16/08/24 21:17:02 INFO netty.NettyClient: connectAllAddresses:
> Successfully added 1 connections, (1 total connected) 0 failed, 0 failures
> total.
>     16/08/24 21:17:02 INFO partition.PartitionUtils:
> computePartitionCount: Creating 1, default would have been 1 partitions.
>     ...
>     16/08/24 21:25:40 INFO netty.NettyClient: stop: Halting netty client
>     16/08/24 21:25:40 INFO netty.NettyClient: stop: reached wait
> threshold, 1 connections closed, releasing resources now.
>     16/08/24 21:25:43 INFO netty.NettyClient: stop: Netty client halted
>     16/08/24 21:25:43 INFO netty.NettyServer: stop: Halting netty server
>     16/08/24 21:25:43 INFO netty.NettyServer: stop: Start releasing
> resources
>     16/08/24 21:25:44 INFO bsp.BspService: process:
> cleanedUpChildrenChanged signaled
>     16/08/24 21:25:47 INFO netty.NettyServer: stop: Netty server halted
>     16/08/24 21:25:47 INFO bsp.BspService: process:
> masterElectionChildrenChanged signaled
>     16/08/24 21:25:47 INFO master.MasterThread: setup: Took 0.898 seconds.
>     16/08/24 21:25:47 INFO master.MasterThread: input superstep: Took
> 452.531 seconds.
>     16/08/24 21:25:47 INFO master.MasterThread: superstep 0: Took 64.376
> seconds.
>     16/08/24 21:25:47 INFO master.MasterThread: superstep 1: Took 1.591
> seconds.
>     16/08/24 21:25:47 INFO master.MasterThread: shutdown: Took 6.609
> seconds.
>     16/08/24 21:25:47 INFO master.MasterThread: total: Took 526.006
> seconds.
>
> As you guys can see, the first line tell us that input superstep is
> executing with only **one** thread. And took 492 second in finish Input
> Superstep.
>
> I did another test, using giraph.numInputThreads=8, tryng to do the input
> superstep with 8 threads:
>
>     /home/hadoop/bin/yarn jar /home/hadoop/giraph/giraph.jar
> ar.edu.info.unlp.tesina.lectura.grafo.BusquedaDeCaminosNavegacionalesWikiquote
> -vif ar.edu.info.unlp.tesina.vertice.estructuras.
> IdTextWithComplexValueInputFormat -vip /user/hduser/input/grafo-wikipedia.txt
> -vof ar.edu.info.unlp.tesina.vertice.estructuras.
> IdTextWithComplexValueOutputFormat -op /user/hduser/output/caminosNavegacionales
> -w 1 -yh 120000 -ca giraph.metrics.enable=true,
> giraph.useOutOfCoreMessages=true,giraph.numInputThreads=8
>
> The result was the following one:
>
>         16/08/24 21:54:00 INFO master.BspServiceMaster:
> generateVertexInputSplits: Got 8 input splits for 8 input threads
>     16/08/24 21:54:00 INFO master.BspServiceMaster:
> createVertexInputSplits: Starting to write input split data to zookeeper
> with 1 threads
>     16/08/24 21:54:00 INFO master.BspServiceMaster:
> createVertexInputSplits: Done writing input split data to zookeeper
>     ...
>
>     16/08/24 22:10:07 INFO master.MasterThread: setup: Took 0.093 seconds.
>     16/08/24 22:10:07 INFO master.MasterThread: input superstep: Took
> 891.339 seconds.
>     16/08/24 22:10:07 INFO master.MasterThread: superstep 0: Took 66.635
> seconds.
>     16/08/24 22:10:07 INFO master.MasterThread: superstep 1: Took 1.837
> seconds.
>     16/08/24 22:10:07 INFO master.MasterThread: shutdown: Took 6.605
> seconds.
>     16/08/24 22:10:07 INFO master.MasterThread: total: Took 966.512
> seconds.
>
>
> So, my question is, how can be possible that Giraph is using 492 seconds
> without input threads and 891 seconds with them? Should be exacly the
> opposite, right?
>
>
>   [1]: https://dumps.wikimedia.org/eswiki/20160601/ "dump"
>

Mime
View raw message