giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: Problem processing large graph
Date Wed, 03 Sep 2014 17:23:01 GMT
Hi Tripti,

Is there a chance you can use higher memory machines so you don't run 
out of core?  We do it this way at Facebook.  We've haven't tested the 
out-of-core option.

Avery

On 8/31/14, 2:34 PM, Tripti Singh wrote:
> Hi,
> I am able to successfully build hadoop_yarn profile for running Giraph 
> 1.1.
> I am also able to test run Connected Components on a small dataset.
> However, I am seeing 2 issues while running on a bigger dataset with 
> 400 mappers:
>
>  1. I am unable to use out of Core Graph option. It errors out saying
>     that it cannot read INIT partition. (Sorry I don’t have the log
>     currently but I will share after I run that again).
>     I am expecting that if the out of Core option is fixed, I should
>     be able to run the workflow with less mappers.
>  2. In order to run the workflow anyhow, I removed the out of Core
>     option and adjusted the heap size. This also runs with smaller
>     dataset but fails with huge dataset.
>     Worker logs are mostly empty. Non-empty logs end like this:
>     *mapred.task.partition is deprecated. Instead, use
>     mapreduce.task.partition
>     [STATUS: task-374] setup: Beginning worker setup. setup: Log level
>     remains at info
>     [STATUS: task-374] setup: Initializing Zookeeper services.
>     mapred.job.id is deprecated.
>     Instead, use mapreduce.job.id job.local.dir is deprecated.
>     Instead, use mapreduce.job.local.dir
>     [STATUS: task-374] setup: Setting up Zookeeper manager.
>     createCandidateStamp: Made the directory
>     _bsp/_defaultZkManagerDir/giraph_yarn_application_1407992474095_708614
>     createCandidateStamp: Made the directory
>     _bsp/_defaultZkManagerDir/giraph_yarn_application_1407992474095_708614/_zkServer
>     createCandidateStamp: Creating my filestamp
>     _bsp/_defaultZkManagerDir/giraph_yarn_application_1407992474095_708614/_task/gsta33201.tan.ygrid.yahoo.com
>     374
>     getZooKeeperServerList: For task 374, got file 'null' (polling
>     period is 3000) *
>
>     Master log has log statements for launching the container, opening
>     proxy and processing event like this:
>     *Opening proxy : gsta31118.tan.ygrid.yahoo.com:8041
>     Processing Event EventType: QUERY_CONTAINER for Container
>     container_1407992474095_708614_01_000314
>     ……*
>
> I am not using SASL authentication.
> Any idea what might be wrong?
>
> Thanks,
> Tripti.
>
>


Mime
View raw message