Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 872D910178 for ; Thu, 23 Jan 2014 09:38:02 +0000 (UTC) Received: (qmail 4108 invoked by uid 500); 23 Jan 2014 09:38:01 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 4081 invoked by uid 500); 23 Jan 2014 09:38:00 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 4072 invoked by uid 99); 23 Jan 2014 09:37:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Jan 2014 09:37:59 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of buyingyi@gmail.com designates 209.85.220.54 as permitted sender) Received: from [209.85.220.54] (HELO mail-pa0-f54.google.com) (209.85.220.54) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Jan 2014 09:37:55 +0000 Received: by mail-pa0-f54.google.com with SMTP id fa1so1622372pad.41 for ; Thu, 23 Jan 2014 01:37:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Ofvwp9OWqUsLSfkJ85vOSkWqOny6hjXLAiZn6Hhbyjc=; b=GX1jD21KUKlbLqJT0fSa/gtIFA+v6uTidnrWNX9Q6Q3pqjXnyk7Z5SbUpoBt3bjrff Qoi2nGm6+D6W3kir0/8b1K1Y6Sg+//idrya8VHqqYyi4Vu2U2M2NeZOqGvJe6tey0CYa 54Bh3RGDk6bqJ0fajMlFF4SlxSwprx7B1xSyeA8MVu7U9QYPD3rPIcTdgZChrPIY9QPq K0J9h8Os4TR+734rJrqrdCu+jw2dXAKLq8Pkqa8M5v5QejrPdPZ0z/HlmyFCjLWzgRGY CAdK9vKCww6OTw6Y3mS1SpbbMX8DxLVqxi/dKJGidTwX2PgtjF1PDGSCduJS7hjIxvrA jJOQ== MIME-Version: 1.0 X-Received: by 10.66.174.165 with SMTP id bt5mr6935834pac.151.1390469854612; Thu, 23 Jan 2014 01:37:34 -0800 (PST) Received: by 10.68.143.129 with HTTP; Thu, 23 Jan 2014 01:37:34 -0800 (PST) In-Reply-To: References: Date: Thu, 23 Jan 2014 01:37:34 -0800 Message-ID: Subject: Re: out of core option From: Yingyi Bu To: user Cc: sebastian.stipkovic@gmail.com, rvesse@dotnetrdf.org Content-Type: multipart/alternative; boundary=047d7bea439ec1523904f09ffdec X-Virus-Checked: Checked by ClamAV on apache.org --047d7bea439ec1523904f09ffdec Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Claudio, Great, thanks! Look forward to the fix! Best regards, Yingyi On Thu, Jan 23, 2014 at 1:34 AM, Claudio Martella < claudio.martella@gmail.com> wrote: > Yep. there's a bug. Where're currently working on this for a fix. Should > be ready in a few days. > > > On Thu, Jan 23, 2014 at 5:10 AM, Yingyi Bu wrote: > >> I just run into the same issue with the latest trunk version. >> Does anybody know how to fix it? >> >> Best regards, >> Yingyi >> >> >> On Fri, Dec 6, 2013 at 8:27 AM, Sebastian Stipkovic < >> sebastian.stipkovic@gmail.com> wrote: >> >>> Hello, >>> >>> I have found a link, where someone describes the same problem: >>> >>> https://issues.apache.org/jira/browse/GIRAPH-788 >>> >>> Does somebody can help me? Does out-of-core-options runs only on >>> particular hadoop? >>> >>> >>> Thanks, >>> Sebastian >>> >>> >>> 2013/12/6 Sebastian Stipkovic >>> >>>> Hi Rob, >>>> >>>> embarrassing. You are right. But now I get with the correct option the >>>> following exception: >>>> >>>> >>>> 2013-12-05 23:10:18,568 INFO org.apache.hadoop.mapred.JobTracker: >>>> Adding task (MAP) 'attempt_201312052304_0001_m_000001_0' to tip >>>> task_201312052304_0001_m_000001, for tracker 'tracker_hduser:localhost= / >>>> 127.0.0.1:39793' 2013-12-05 23:10:27,645 INFO >>>> org.apache.hadoop.mapred.TaskInProgress: Error from >>>> attempt_201312052304_0001_m_000001_0: java.lang.IllegalStateException:= run: >>>> Caught an unrecoverable exception waitFor: ExecutionException occurred >>>> while waiting for >>>> org.apache.giraph.utils.ProgressableUtils$FutureWaitable@62bf5822 at >>>> org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101) at >>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at >>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at >>>> org.apache.hadoop.mapred.Child$4.run(Child.java:259) at >>>> java.security.AccessController.doPrivileged(Native Method) at >>>> javax.security.auth.Subject.doAs(Subject.java:415) at >>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformat= ion.java:1059) >>>> at org.apache.hadoop.mapred.Child.main(Child.java:253) Caused by: >>>> java.lang.IllegalStateException: waitFor: ExecutionException occurred = while >>>> waiting for >>>> org.apache.giraph.utils.ProgressableUtils$FutureWaitable@62bf5822 at >>>> org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.ja= va:181) >>>> at >>>> org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtil= s.java:139) >>>> at >>>> org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtil= s.java:124) >>>> at >>>> org.apache.giraph.utils.ProgressableUtils.getFutureResult(Progressable= Utils.java:87) >>>> at >>>> org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(Pro= gressableUtils.java:221) >>>> at >>>> org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWo= rker.java:281) >>>> at >>>> org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorke= r.java:325) >>>> at >>>> org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:= 506) >>>> at >>>> org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java= :244) >>>> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:91) ... 7 = more >>>> Caused by: java.util.concurrent.ExecutionException: >>>> java.lang.IllegalStateException: getOrCreatePartition: cannot retrieve >>>> partition 0 at >>>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:262) at >>>> java.util.concurrent.FutureTask.get(FutureTask.java:119) at >>>> org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(Progr= essableUtils.java:300) >>>> at >>>> org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.ja= va:173) >>>> ... 16 more Caused by: java.lang.IllegalStateException: >>>> getOrCreatePartition: cannot retrieve partition 0 at >>>> org.apache.giraph.partition.DiskBackedPartitionStore.getOrCreatePartit= ion(DiskBackedPartitionStore.java:243) >>>> at >>>> org.apache.giraph.comm.requests.SendWorkerVerticesRequest.doRequest(Se= ndWorkerVerticesRequest.java:110) >>>> at >>>> org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doReque= st(NettyWorkerClientRequestProcessor.java:482) >>>> at >>>> org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendVer= texRequest(NettyWorkerClientRequestProcessor.java:276) >>>> at >>>> org.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(Vert= exInputSplitsCallable.java:172) >>>> at >>>> org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplit= sCallable.java:267) >>>> at >>>> org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.= java:211) >>>> at >>>> org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.= java:60) >>>> at >>>> org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallab= le.java:51) >>>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) = at >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166) at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j= ava:1145) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.= java:615) >>>> at java.lang.Thread.run(Thread.java:724) Caused by: >>>> java.util.concurrent.ExecutionException: java.lang.NullPointerExceptio= n at >>>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) at >>>> java.util.concurrent.FutureTask.get(FutureTask.java:111) at >>>> org.apache.giraph.partition.DiskBackedPartitionStore.getOrCreatePartit= ion(DiskBackedPartitionStore.java:228) >>>> ... 13 more Caused by: java.lang.NullPointerException at >>>> org.apache.giraph.partition.DiskBackedPartitionStore$GetPartition.call= (DiskBackedPartitionStore.java:692) >>>> at >>>> org.apache.giraph.partition.DiskBackedPartitionStore$GetPartition.call= (DiskBackedPartitionStore.java:658) >>>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) = at >>>> java.util.concurrent.FutureTask.run(FutureTask.java:166) at >>>> org.apache.giraph.partition.DiskBackedPartitionStore$DirectExecutorSer= vice.execute(DiskBackedPartitionStore.java:972) >>>> at >>>> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorSe= rvice.java:132) >>>> ... 14 more >>>> >>>> >>>> Thanks, >>>> Sebastian >>>> >>>> >>>> 2013/12/5 Rob Vesse >>>> >>>>> Sebastian >>>>> >>>>> You've made a minor typo in the configuration setting which means you >>>>> haven't actually enabled out of core graph mode. >>>>> >>>>> You have *giraph.useOutOfCoreGiraph *when it should be *giraph.useOut= OfCoreGraph >>>>> *=96 note that the last word is Graph not Giraph >>>>> >>>>> Rob >>>>> >>>>> From: Sebastian Stipkovic >>>>> Reply-To: >>>>> Date: Thursday, 5 December 2013 20:39 >>>>> To: >>>>> Subject: out of core option >>>>> >>>>> Hello, >>>>> >>>>> I had setup giraph 1.1.0 with hadoop-0.20.203.0rc1 on a single >>>>> node cluster. It computes a tiny graph successful. But if the >>>>> input graph is huge (5 GB), I get an OutOfMemory(Garbage Collector) >>>>> exception, although I had turned on the out-of-memory-option. The job >>>>> with out-of-memory-option works only well with a tiny graph (0.9 GB).= What >>>>> is Wrong? Does I have to do furthermore configurations? >>>>> >>>>> My Configurations are as follows: >>>>> >>>>> >>>>> namevalue*fs.s3n.impl*org.apache.hadoop.fs.s3native.NativeS3FileSyste= m >>>>> *mapred.task.cache.levels*2*giraph.vertexOutputFormatClass* >>>>> org.apache.giraph.examples.MyShortestPaths$MyOutputFormat >>>>> *hadoop.tmp.dir*/app/hadoop/tmp*hadoop.native.lib*true*map.sort.class= *org.apache.hadoop.util.QuickSort >>>>> *dfs.namenode.decommission.nodes.per.interval*5 >>>>> *dfs.https.need.client.auth*false *ipc.client.idlethreshold*4000 >>>>> *dfs.datanode.data.dir.perm*755*mapred.system.dir* >>>>> ${hadoop.tmp.dir}/mapred/system >>>>> *mapred.job.tracker.persist.jobstatus.hours*0*dfs.datanode.address* >>>>> 0.0.0.0:50010*dfs.namenode.logging.level*info >>>>> *dfs.block.access.token.enable* false*io.skip.checksum.errors*false*f= s.default.name >>>>> * hdfs://localhost:54310 >>>>> *mapred.cluster.reduce.memory.mb*-1*mapred.child.tmp* ./tmp >>>>> *fs.har.impl.disable.cache*true*dfs.safemode.threshold.pct*0.999f >>>>> *mapred.skip.reduce.max.skip.groups*0*dfs.namenode.handler.count*10 >>>>> *dfs.blockreport.initialDelay* 0*mapred.heartbeats.in.second*100 >>>>> *mapred.tasktracker.dns.nameserver*default*io.sort.factor* 10 >>>>> *mapred.task.timeout*600000*giraph.maxWorkers*1 >>>>> *mapred.max.tracker.failures* 4 >>>>> *hadoop.rpc.socket.factory.class.default* >>>>> org.apache.hadoop.net.StandardSocketFactory >>>>> *mapred.job.tracker.jobhistory.lru.cache.size* 5*fs.hdfs.impl* >>>>> org.apache.hadoop.hdfs.DistributedFileSystem >>>>> *mapred.queue.default.acl-administer-jobs* * >>>>> *dfs.block.access.key.update.interval*600 >>>>> *mapred.skip.map.auto.incr.proc.count*true >>>>> *mapreduce.job.complete.cancel.delegation.tokens*true >>>>> *io.mapfile.bloom.size*1048576 >>>>> *mapreduce.reduce.shuffle.connect.timeout* 180000 >>>>> *dfs.safemode.extension*30000 >>>>> *mapred.jobtracker.blacklist.fault-timeout-window*180 >>>>> *tasktracker.http.threads*40*mapred.job.shuffle.merge.percent*0.66 >>>>> *mapreduce.inputformat.class* org.apache.giraph.bsp.BspInputFormat >>>>> *fs.ftp.impl*org.apache.hadoop.fs.ftp.FTPFileSystem*user.name >>>>> * hduser*mapred.output.compress*false >>>>> *io.bytes.per.checksum*512*giraph.isStaticGraph* true >>>>> *mapred.healthChecker.script.timeout*600000 >>>>> *topology.node.switch.mapping.impl* >>>>> org.apache.hadoop.net.ScriptBasedMapping >>>>> *dfs.https.server.keystore.resource*ssl-server.xml >>>>> *mapred.reduce.slowstart.completed.maps*0.05 >>>>> *mapred.reduce.max.attempts*4*fs.ramfs.impl* >>>>> org.apache.hadoop.fs.InMemoryFileSystem >>>>> *dfs.block.access.token.lifetime* 600*dfs.name.edits.dir* >>>>> ${dfs.name.dir}*mapred.skip.map.max.skip.records*0 >>>>> *mapred.cluster.map.memory.mb*-1*hadoop.security.group.mapping* >>>>> org.apache.hadoop.security.ShellBasedUnixGroupsMapping >>>>> *mapred.job.tracker.persist.jobstatus.dir*/jobtracker/jobsInfo >>>>> *mapred.jar*hdfs://localhost:54310 >>>>> /app/hadoop/tmp/mapred/staging/hduser/.staging/job_201312051827_0001/= job.jar >>>>> *dfs.block.size*67108864*fs.s3.buffer.dir*${hadoop.tmp.dir}/s3 >>>>> *job.end.retry.attempts* 0*fs.file.impl* >>>>> org.apache.hadoop.fs.LocalFileSystem*mapred.local.dir.minspacestart*0 >>>>> *mapred.output.compression.type*RECORD*dfs.datanode.ipc.address* >>>>> 0.0.0.0:50020 *dfs.permissions*true*topology.script.number.args*100 >>>>> *io.mapfile.bloom.error.rate* 0.005 >>>>> *mapred.cluster.max.reduce.memory.mb*-1*mapred.max.tracker.blacklists= * >>>>> 4 *mapred.task.profile.maps*0-2*dfs.datanode.https.address* >>>>> 0.0.0.0:50475 *mapred.userlog.retain.hours*24 >>>>> *dfs.secondary.http.address*0.0.0.0:50090 *dfs.replication.max*512 >>>>> *mapred.job.tracker.persist.jobstatus.active*false >>>>> *hadoop.security.authorization* false*local.cache.size*10737418240 >>>>> *dfs.namenode.delegation.token.renew-interval*86400000 >>>>> *mapred.min.split.size*0*mapred.map.tasks*2*mapred.child.java.opts*-X= mx4000m >>>>> *mapreduce.job.counters.limit*120*dfs.https.client.keystore.resource* >>>>> ssl-client.xml *mapred.job.queue.name * >>>>> default*dfs.https.address*0.0.0.0:50470 >>>>> *mapred.job.tracker.retiredjobs.cache.size*1000 >>>>> *dfs.balance.bandwidthPerSec*1048576 *ipc.server.listen.queue.size* >>>>> 128*mapred.inmem.merge.threshold*1000*job.end.retry.interval*30000 >>>>> *mapred.skip.attempts.to.start.skipping*2*fs.checkpoint.dir* >>>>> ${hadoop.tmp.dir}/dfs/namesecondary*mapred.reduce.tasks* 0 >>>>> *mapred.merge.recordsBeforeProgress*10000*mapred.userlog.limit.kb*0 >>>>> *mapred.job.reduce.memory.mb*-1*dfs.max.objects*0 >>>>> *webinterface.private.actions*false *io.sort.spill.percent*0.80 >>>>> *mapred.job.shuffle.input.buffer.percent*0.70*mapred.job.name >>>>> * Giraph: >>>>> org.apache.giraph.examples.MyShortestPaths >>>>> *dfs.datanode.dns.nameserver*default >>>>> *mapred.map.tasks.speculative.execution* false*hadoop.util.hash.type* >>>>> murmur*dfs.blockreport.intervalMsec*3600000 *mapred.map.max.attempts*= 0 >>>>> *mapreduce.job.acl-view-job* >>>>> *dfs.client.block.write.retries* 3*mapred.job.tracker.handler.count*1= 0 >>>>> *mapreduce.reduce.shuffle.read.timeout*180000 >>>>> *mapred.tasktracker.expiry.interval*600000*dfs.https.enable*false >>>>> *mapred.jobtracker.maxtasks.per.job* -1 >>>>> *mapred.jobtracker.job.history.block.size*3145728 >>>>> *giraph.useOutOfCoreGiraph*true *keep.failed.task.files*false >>>>> *mapreduce.outputformat.class*org.apache.giraph.bsp.BspOutputFormat >>>>> *dfs.datanode.failed.volumes.tolerated*0*ipc.client.tcpnodelay*false >>>>> *mapred.task.profile.reduces* 0-2*mapred.output.compression.codec* >>>>> org.apache.hadoop.io.compress.DefaultCodec*io.map.index.skip*0 >>>>> *mapred.working.dir*hdfs://localhost:54310/user/hduser >>>>> *ipc.server.tcpnodelay* false >>>>> *mapred.jobtracker.blacklist.fault-bucket-width*15 >>>>> *dfs.namenode.delegation.key.update-interval*86400000 >>>>> *mapred.used.genericoptionsparser*true*mapred.mapper.new-api*true >>>>> *mapred.job.map.memory.mb* -1*giraph.vertex.input.dir* >>>>> hdfs://localhost:54310/user/hduser/output >>>>> *dfs.default.chunk.view.size*32768*hadoop.logfile.size*10000000 >>>>> *mapred.reduce.tasks.speculative.execution* true*mapreduce.job.dir* >>>>> hdfs://localhost:54310 >>>>> /app/hadoop/tmp/mapred/staging/hduser/.staging/job_201312051827_0001 >>>>> *mapreduce.tasktracker.outofband.heartbeat*false >>>>> *mapreduce.reduce.input.limit*-1*dfs.datanode.du.reserved* 0 >>>>> *hadoop.security.authentication*simple*fs.checkpoint.period*3600 >>>>> *dfs.web.ugi*webuser,webgroup*mapred.job.reuse.jvm.num.tasks*1 >>>>> *mapred.jobtracker.completeuserjobs.maximum* 100*dfs.df.interval*6000= 0 >>>>> *dfs.data.dir*${hadoop.tmp.dir}/dfs/data >>>>> *mapred.task.tracker.task-controller* >>>>> org.apache.hadoop.mapred.DefaultTaskController*giraph.minWorkers*1 >>>>> *fs.s3.maxRetries* 4*dfs.datanode.dns.interface*default >>>>> *mapred.cluster.max.map.memory.mb*-1 *dfs.support.append*false >>>>> *mapreduce.job.acl-modify-job* >>>>> *dfs.permissions.supergroup* supergroup*mapred.local.dir* >>>>> ${hadoop.tmp.dir}/mapred/local*fs.hftp.impl* >>>>> org.apache.hadoop.hdfs.HftpFileSystem *fs.trash.interval*0 >>>>> *fs.s3.sleepTimeSeconds*10*dfs.replication.min* 1 >>>>> *mapred.submit.replication*10*fs.har.impl* >>>>> org.apache.hadoop.fs.HarFileSystem >>>>> *mapred.map.output.compression.codec* >>>>> org.apache.hadoop.io.compress.DefaultCodec >>>>> *mapred.tasktracker.dns.interface*default >>>>> *dfs.namenode.decommission.interval* 30*dfs.http.address*0.0.0.0:5007= 0 >>>>> *dfs.heartbeat.interval* 3*mapred.job.tracker*localhost:54311 >>>>> *mapreduce.job.submithost* hduser*io.seqfile.sorter.recordlimit* >>>>> 1000000*giraph.vertexInputFormatClass* >>>>> org.apache.giraph.examples.MyShortestPaths$MyInputFormat >>>>> *dfs.name.dir*${hadoop.tmp.dir}/dfs/name >>>>> *mapred.line.input.format.linespermap*1 >>>>> *mapred.jobtracker.taskScheduler* >>>>> org.apache.hadoop.mapred.JobQueueTaskScheduler >>>>> *dfs.datanode.http.address*0.0.0.0:50075 >>>>> *mapred.local.dir.minspacekill*0*dfs.replication.interval*3 >>>>> *io.sort.record.percent* 0.05*fs.kfs.impl* >>>>> org.apache.hadoop.fs.kfs.KosmosFileSystem*mapred.temp.dir* >>>>> ${hadoop.tmp.dir}/mapred/temp >>>>> *mapred.tasktracker.reduce.tasks.maximum*2 >>>>> *mapreduce.job.user.classpath.first*true*dfs.replication* 1 >>>>> *fs.checkpoint.edits.dir*${fs.checkpoint.dir}*giraph.computationClass= * >>>>> org.apache.giraph.examples.MyShortestPaths >>>>> *mapred.tasktracker.tasks.sleeptime-before-sigkill*5000 >>>>> *mapred.job.reduce.input.buffer.percent*0.0 >>>>> *mapred.tasktracker.indexcache.mb*10 >>>>> *mapreduce.job.split.metainfo.maxsize*10000000*hadoop.logfile.count* >>>>> 10*mapred.skip.reduce.auto.incr.proc.count*true >>>>> *mapreduce.job.submithostaddress*127.0.1.1 >>>>> *io.seqfile.compress.blocksize*1000000*fs.s3.block.size*67108864 >>>>> *mapred.tasktracker.taskmemorymanager.monitoring-interval* 5000 >>>>> *giraph.minPercentResponded*100.0*mapred.queue.default.state*RUNNING >>>>> *mapred.acls.enabled*false*mapreduce.jobtracker.staging.root.dir* >>>>> ${hadoop.tmp.dir}/mapred/staging*mapred.queue.names* default >>>>> *dfs.access.time.precision*3600000*fs.hsftp.impl* >>>>> org.apache.hadoop.hdfs.HsftpFileSystem >>>>> *mapred.task.tracker.http.address*0.0.0.0:50060 >>>>> *mapred.reduce.parallel.copies* 5*io.seqfile.lazydecompress*true >>>>> *mapred.output.dir*/user/hduser/output/shortestpaths *io.sort.mb*100 >>>>> *ipc.client.connection.maxidletime*10000*mapred.compress.map.output*f= alse >>>>> *hadoop.security.uid.cache.secs*14400 >>>>> *mapred.task.tracker.report.address*127.0.0.1:0 >>>>> *mapred.healthChecker.interval*60000*ipc.client.kill.max*10 >>>>> *ipc.client.connect.max.retries* 10*ipc.ping.interval*300000 >>>>> *mapreduce.user.classpath.first*true *mapreduce.map.class* >>>>> org.apache.giraph.graph.GraphMapper*fs.s3.impl* >>>>> org.apache.hadoop.fs.s3.S3FileSystem*mapred.user.jobconf.limit* >>>>> 5242880*mapred.job.tracker.http.address*0.0.0.0:50030 >>>>> *io.file.buffer.size* 4096*mapred.jobtracker.restart.recover*false >>>>> *io.serializations* >>>>> org.apache.hadoop.io.serializer.WritableSerialization >>>>> *dfs.datanode.handler.count*3*mapred.reduce.copy.backoff*300 >>>>> *mapred.task.profile* false*dfs.replication.considerLoad*true >>>>> *jobclient.output.filter*FAILED >>>>> *dfs.namenode.delegation.token.max-lifetime*604800000 >>>>> *mapred.tasktracker.map.tasks.maximum*4*io.compression.codecs* >>>>> org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compr= ess.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec >>>>> *fs.checkpoint.size*67108864 >>>>> >>>>> Additionally, if I have more than one worker I get an Exception, too? >>>>> Are my configurations wrong? >>>>> >>>>> >>>>> best regards, >>>>> Sebastian >>>>> >>>>> >>>> >>> >> > > > -- > Claudio Martella > claudio.martella@gmail.com > --047d7bea439ec1523904f09ffdec Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable
Claudio,
=A0 =A0
=A0 =A0Great, thanks! =A0Lo= ok forward to the fix!

Best regards,
Yingyi


On Thu, Jan 23, 2014 at 1:34 AM, Claudio Martella <claudio.martel= la@gmail.com> wrote:
Yep. there's a bug. Where're currently working on = this for a fix. Should be ready in a few days.


On Thu, Jan 2= 3, 2014 at 5:10 AM, Yingyi Bu <buyingyi@gmail.com> wrote:
I just run into the same is= sue with the latest trunk version.
Does anybody know how to fix it?

Best regards,
Yingyi


On Fri, Dec 6, 2013 at 8:27 AM, Sebastian Stipkovic <<= a href=3D"mailto:sebastian.stipkovic@gmail.com" target=3D"_blank">sebastian= .stipkovic@gmail.com> wrote:
Hello,

I have found a<= /span> link, where someone describes the same problem:

https://issues.apache.org/jira/browse/GIRAPH-788

<= div>Does somebody can help me? Does out-of-core-options runs only on partic= ular hadoop?


Thanks,
Sebastian


2013/12/6 Se= bastian Stipkovic <sebastian.stipkovic@gmail.com>
Hi Rob,

embarrassing. You are right. But now I get with the correc= t option the following exception:


2013-12-05 23:10:18,568 INFO org.apache.hadoop.m= apred.JobTracker: Adding task (MAP) 'attempt_201312052304_0001_m_000001_0' to tip=20 task_201312052304_0001_m_000001, for tracker=20 'tracker_hduser:localhost/127.0.0.1:39793' 2013-12-05 23:10:27,645 INFO org.apache.hadoop.mapred.TaskInProgress:=20 Error from attempt_201312052304_0001_m_000001_0:=20 java.lang.IllegalStateException: run: Caught an unrecoverable exception=20 waitFor: ExecutionException occurred while waiting for=20 org.apache.giraph.utils.ProgressableUtils$FutureWaitable@62bf5822 at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at org.apache.hadoop.mapred.Child$4.run(Child.java:259) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at=20 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j= ava:1059) at org.apache.hadoop.mapred.Child.main(Child.java:253) Caused by: java.lang.IllegalStateException: waitFor: ExecutionException=20 occurred while waiting for=20 org.apache.giraph.utils.ProgressableUtils$FutureWaitable@62bf5822 at=20 org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:18= 1) at=20 org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.jav= a:139) at=20 org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.jav= a:124) at=20 org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils= .java:87) at=20 org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(Progress= ableUtils.java:221) at=20 org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.= java:281) at=20 org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.jav= a:325) at=20 org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:506) at=20 org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:244) at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:91) ... 7 more Caused by: java.util.concurrent.ExecutionException:=20 java.lang.IllegalStateException: getOrCreatePartition: cannot retrieve=20 partition 0 at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:262) at java.util.concurrent.FutureTask.get(FutureTask.java:119) at=20 org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(Progressab= leUtils.java:300) at=20 org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:17= 3) ... 16 more Caused by: java.lang.IllegalStateException: getOrCreatePartition: cannot retrieve partition 0 at=20 org.apache.giraph.partition.DiskBackedPartitionStore.getOrCreatePartition(D= iskBackedPartitionStore.java:243) at=20 org.apache.giraph.comm.requests.SendWorkerVerticesRequest.doRequest(SendWor= kerVerticesRequest.java:110) at=20 org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(Ne= ttyWorkerClientRequestProcessor.java:482) at=20 org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendVertexRe= quest(NettyWorkerClientRequestProcessor.java:276) at=20 org.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(VertexInp= utSplitsCallable.java:172) at=20 org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCall= able.java:267) at=20 org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:= 211) at=20 org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:= 60) at=20 org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.ja= va:51) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at=20 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1= 145) at=20 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:= 615) at java.lang.Thread.run(Thread.java:724) Caused by: java.util.concurrent.ExecutionException:=20 java.lang.NullPointerException at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) at java.util.concurrent.FutureTask.get(FutureTask.java:111) at=20 org.apache.giraph.partition.DiskBackedPartitionStore.getOrCreatePartition(D= iskBackedPartitionStore.java:228) ... 13 more Caused by: java.lang.NullPointerException at=20 org.apache.giraph.partition.DiskBackedPartitionStore$GetPartition.call(Disk= BackedPartitionStore.java:692) at=20 org.apache.giraph.partition.DiskBackedPartitionStore$GetPartition.call(Disk= BackedPartitionStore.java:658) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at=20 org.apache.giraph.partition.DiskBackedPartitionStore$DirectExecutorService.= execute(DiskBackedPartitionStore.java:972) at=20 java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService= .java:132) ... 14 more


Thanks,
Sebastian


2013/12/5 Rob Vesse <rvesse@dotnetrdf.org>
Sebastian

You&= #39;ve made a minor typo in the configuration setting which means you haven= 't actually enabled out of core graph mode.

You have=A0giraph.useOutOfCoreGiraph whe= n it should be=A0giraph.useOutOfCoreGraph =96 note that the last wor= d is Graph not Giraph

Rob

From: Sebastian Stipkovic <sebastian.st= ipkovic@gmail.com>
Reply-To: <user@g= iraph.apache.org>
Date: Thursday, 5 December 2013 20= :39
To: <user@giraph.apache.org>
= Subject: out of core option

Hello,

I had setup giraph 1.1.0 with hadoop-0.20.203.0rc1 =A0on a single
node cluster. It computes a tiny graph successful. But if the
input graph is huge (5 GB), I get an OutOfMemory(Garbage Collector)
exception, although I had turned on the out-of-memory-option. The job=20 with out-of-memory-option works only well with a tiny graph (0.9 GB).=A0=20 What is Wrong? Does I have to do furthermore configurations?

My Configurations are as follows:


true<= tr> mapred.skip.reduce.max.skip.groups= 100= ssl-server.xml mapred.reduce.max.attempts= <= td width=3D"35%">hadoop.security.group.mapping= 100<= tr>dfs.secondary.http.address= <= /tr>dfs.max.objects mapred.map.max.attempts= = org.apache.giraph.bsp.BspOutputFormat<= td width=3D"35%"> dfs.default.chunk.view.size<= td width=3D"35%"> dfs.support.append<= td width=3D"65%"> 1<= td width=3D"35%">mapred.tasktracker.dns.interface<= tr><= /tr>= <= tr><= td width=3D"35%"> mapred.acls.enabled dfs.namenode.delegation.token.max-lifetime
namevalue
= fs.s3n.implorg.apache.hadoop.fs.s3native.NativeS= 3FileSystem
mapred.task.cache.levels2
giraph.vertexOutputFormatClassorg.apache.giraph.examples.MyShortestPaths$MyOutput= Format
hadoop.tmp.dir/app= /hadoop/tmp
hadoop.native.lib
map.sort.class org.apache.hadoop.util.QuickSort
dfs.name= node.decommission.nodes.per.interval5
dfs.https.need.client.authfalse
ipc.client.idlethreshold4000
dfs.datanode.data.dir.perm= 755
mapred.system.= dir ${hadoop.tmp.dir}/mapred/system
mapred.job.tracker.persist.jobstatus.hours0
dfs.datanode.address 0.0.0.0:50010
dfs.namenode.logging.levelinfo
dfs.block.access.token.enable<= /b> false
io.skip.checksum.= errorsfalse
fs.default.name<= /td> hdfs://localhost:54310
mapred.cluster.reduce.memory.m= b-1
mapred.chi= ld.tmp ./tmp
fs.har.impl.disab= le.cachetrue
d= fs.safemode.threshold.pct0.999f
0
dfs.namenode.handler.count10
dfs.blockreport.initialDelay 0
mapred.heartbeats.in.second
mapred.tasktracker.dns= .nameserverdefault
io.sort.factor 10
mapred.task.timeout<= /b>600000
giraph.m= axWorkers1
map= red.max.tracker.failures 4
hadoop.rpc.socket.fac= tory.class.defaultorg.apache.hadoop.net.Standard= SocketFactory
mapred.job.tracker.jobhisto= ry.lru.cache.size 5
fs.hdfs.implorg.apache.hadoop.hdfs.DistributedFileSystem
mapred.queue.default.acl-administer-jobs *
dfs.block.access.key.update.interval600
mapred.skip.m= ap.auto.incr.proc.counttrue
mapreduce.job.complete.cancel.delegation.tokenstrue
io.mapfile.bloom.size1048576
mapreduce.reduce.sh= uffle.connect.timeout 180000
dfs.safemode.ext= ension30000
ma= pred.jobtracker.blacklist.fault-timeout-window18= 0
tasktracker.http.threads40
mapred.job.shuffle.merge.percent= 0.66
mapreduce= .inputformat.class org.apache.giraph.bsp.BspInputFormat
fs.ftp.implorg.apache.hadoop.fs.ft= p.FTPFileSystem
user.name hduser
mapred.output.co= mpressfalse
io= .bytes.per.checksum512
giraph.isStaticGraph true
mapred.healthCheck= er.script.timeout600000
topology.node.switch.mapping.implorg= .apache.hadoop.net.ScriptBasedMapping
dfs.https.server.keystore.resource
mapred.red= uce.slowstart.completed.maps0.05
4
fs.ramfs.implorg.apache.hadoop.= fs.InMemoryFileSystem
dfs.block.access.to= ken.lifetime 600
dfs.name.edits.dir<= /b>${dfs.name.dir}
mapred.skip.map.max.skip.records0
mapred.cluster.map.memory.mb-1
org.apache.hadoop.security.ShellBasedUnixGroupsMapping
mapred.job.tracker.persist.jobstatus.dir/jobt= racker/jobsInfo
mapred.jarhdfs://localhost:54310/app/hadoop/tmp/mapred/staging/hduser/.staging/job_20= 1312051827_0001/job.jar
dfs.block.size6710= 8864
fs.s3.buffer.dir${hadoop.tmp.dir}/s3
job.end.ret= ry.attempts 0
fs.file.implorg.apache.hadoop.fs.LocalFileSystem
mapred.local.dir.minspacestart0
mapred.output.compression.typeRECORD
dfs.datanode.ipc.address<= /b>0.0.0.0:50020
dfs.permissionstru= e
topology.script.number.args
io.mapfile.bloom.error= .rate 0.005
mapred.cluster.ma= x.reduce.memory.mb-1
mapred.max.tracker.blacklists4
mapred.task.profile.maps0-2
dfs.datanode.https.address0.0.0.0:50475
mapred.userlog.retain.hours24
0.0.0.0:50090
dfs.replication.max512
mapred.job.tracker.persist.jobstatus.activefalse
hadoop.security.authorizat= ion false
local.cache.size<= /b>10737418240
dfs= .namenode.delegation.token.renew-interval8640000= 0
mapred.min.split.size0
mapred.map.tasks2
mapred.child.java.opts -Xmx4000m
mapreduce.job.counters.limit120
dfs.https.cli= ent.keystore.resourcessl-client.xml
mapred.job.qu= eue.namedefault
dfs.https.address0.0.0.0:50470
mapred.job.tracker.retiredjobs.cache.size= 1000
dfs.balance.b= andwidthPerSec1048576
ipc.server.listen.queue.size 128
mapred.inmem.merge.= threshold1000
= job.end.retry.interval30000
mapred.skip.attempts.to.start.skipping2
fs.checkpoint.dir${= hadoop.tmp.dir}/dfs/namesecondary
mapred.= reduce.tasks 0
mapred.merge.recordsB= eforeProgress10000
mapred.userlog.limit.kb0
mapred.job.reduce.memory.mb-1
0
webinterface.private.actionsfalse
io.sort.spill.percent0.80
mapred.job.shuffle.input.buffer.p= ercent0.70
mapred.job.name Giraph: org.apache.giraph.examples.MyShortestPaths
dfs.datanode.dns.nameserverdefault
mapred.map.tasks.specula= tive.execution false
hadoop.util.hash.= typemurmur
dfs= .blockreport.intervalMsec3600000
0
mapreduce.job.acl-view-job
dfs.client.block.write.retries 3
mapred.job.tracker.handler.count10
mapreduce.reduce.s= huffle.read.timeout180000
mapred.tasktracker.expiry.interval600000
dfs.https.enablefa= lse
mapred.jobtracker.maxtasks.per.job -1
mapred.jobtracker.jo= b.history.block.size3145728
giraph.useOutOfCoreGiraphtrue
keep.failed.task.filesfalse=
mapreduce.outputformat.class
dfs.datanode.failed.volumes.tolerated0
ipc.client.tcpnodelayfalse
mapred.task.profile.reduces 0-2
mapred.output.compression.codecorg.apache.hadoop.io.compress.DefaultCodec
io.map.index.skip0
mapred.working.dirhdfs://lo= calhost:54310/user/hduser
ipc.server.tcpnodelay false
mapred.jobtracker.blacklist.fault-b= ucket-width15
= dfs.namenode.delegation.key.update-interval86400= 000
mapred.used.genericoptionsparsertrue
mapred.mapper.new-api<= /b>true
mapred.job= .map.memory.mb -1
giraph.vertex.input.= dirhdfs://localhost:54310/user/hduser/output
32768
hadoop.logfile.size1000000= 0
mapred.reduce.tasks.speculative.executi= on true
mapreduce.job.dir<= /b>hdfs://localhost:54310/app/hadoop/tmp/mapred/staging/hduser/.= staging/job_201312051827_0001
mapreduce.tasktracker.outofband.heartbeat= false
mapreduce.re= duce.input.limit-1
dfs.datanode.du.reserved 0
hadoop.security.authe= nticationsimple
<= b>fs.checkpoint.period3600
dfs.web.ugiwebuser,webgroup
mapred.job.reuse.jvm.num.tasks1=
mapred.jobtracker.completeuserjobs.maxim= um 100
dfs.df.interval= 60000
dfs.data.dir= ${hadoop.tmp.dir}/dfs/data
mapred.task.tracker.task-controllerorg.apache= .hadoop.mapred.DefaultTaskController
gira= ph.minWorkers1
fs.s3.maxRetries 4
dfs.datanode.dns.inte= rfacedefault
m= apred.cluster.max.map.memory.mb-1
false
mapreduce.job.acl-modify-job
dfs.permissions.supergroup supergroup
mapred.local.dir${hadoop.tmp.dir}/mapred/local
= fs.hftp.implorg.apache.hadoop.hdfs.HftpFileSy= stem
fs.trash.interval0=
fs.s3.sleepTimeSeconds10
dfs.replication.min
mapred.submit.replication10
fs.har.implorg.apache.hadoop.fs.HarFileSystem
mapred.map.output.compression.codec org.apache.hadoop.io.compress.DefaultCodec
default
dfs.namenode.decommission.int= erval 30
dfs.http.address= 0= .0.0.0:50070
dfs.heartbeat.interval 3
mapred.job.trackerlocalhost:= 54311
mapreduce.job.submithost hduser
io.seqfile.sorter.recordlimit<= /td>1000000
giraph.vert= exInputFormatClassorg.apache.giraph.examples.MyS= hortestPaths$MyInputFormat
dfs.name.dir${hado= op.tmp.dir}/dfs/name
mapred.line.input.fo= rmat.linespermap1
mapred.jobtracker.taskScheduler org.apache.hadoop.mapred.JobQueueTaskScheduler
dfs.datanode.http.address0.0.0.0:50075
mapred.local.dir.minspacekill0
dfs.replication.interval3
io.sort.record.percent 0.05
fs.kfs.implorg.apache.hadoop.fs.kfs.KosmosFileSystem
mapred.temp.dir${hadoop.tmp= .dir}/mapred/temp
mapred.tasktracker.reduce.tasks.maximum2
mapreduce.job.user= .classpath.firsttrue
dfs.replication 1
fs.checkpoint.edits.d= ir${fs.checkpoint.dir}
giraph.computationClassorg.apache.girap= h.examples.MyShortestPaths
mapred.tasktracker.tasks.sleeptime-before-sig= kill5000
mapre= d.job.reduce.input.buffer.percent0.0
mapred.tasktracker.indexcache.mb10
mapreduce.job.split.metainfo.maxsize10000000
hadoop.logfile.count= 10
mapred.skip.reduce.a= uto.incr.proc.counttrue
mapreduce.job.submithostaddress127.0= .1.1
io.seqfile.compress.blocksize1000000
fs.s3.block.size67108864
mapred.task= tracker.taskmemorymanager.monitoring-interval 5000
giraph.minPercentR= esponded100.0
= mapred.queue.default.stateRUNNING
false
mapreduce.jobtracker.staging.root.dir${hadoop.tmp.dir}/mapred/staging
mapre= d.queue.names default
dfs.access.time= .precision3600000
fs.hsftp.implorg.apache.hadoop.hdfs.HsftpFil= eSystem
mapred.task.tracker.http.address0.0.0.0:50= 060
mapred.reduce.parallel.copies= 5
io.seqfile.lazydecompresstrue
mapred.output.dir/user/hduser/output/shortestpaths
io.sort.mb100
= ipc.client.connection.maxidletime10000
mapred.compress.map.output false
hadoop.security.uid.cache.secs<= /td>14400
mapred.task.t= racker.report.address127.0.0.1:0
mapred.healthChecker.interval60000
ipc.client.kill.max<= /td>10
ipc.client.conne= ct.max.retries 10
ipc.ping.interval300000
mapreduce.= user.classpath.firsttrue
mapreduce.map.classorg.apache.giraph.graph.Gr= aphMapper
fs.s3.implorg.apache.hadoop.fs.s3.S3FileSystem
<= b>mapred.user.jobconf.limit 5242880
mapred.job.trac= ker.http.address0.0.0.0:50030
io.f= ile.buffer.size 4096
mapred.jobtracker.= restart.recoverfalse
io.serializationsorg.apache.hadoop.io.ser= ializer.WritableSerialization
dfs.datanode.handler.count3
mapred.reduce.copy.backoff= 300
mapred.task.pr= ofile false
dfs.replication.c= onsiderLoadtrue
<= b>jobclient.output.filterFAILED
604= 800000
mapred.tasktracker.map.tasks.maxim= um4
io.compres= sion.codecs org.apache.hadoop.io.compress.DefaultCodec,org.apache.had= oop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec
fs.checkpoint.size6710= 8864


Additionally, if I have more than one worker I= get an Exception, too? Are my configurations wrong?


best regards,
Sebastian






<= /div>--
=A0 =A0Claudio = Martella
=A0 =A0claudio.martella@gmail.com=A0 =A0

--047d7bea439ec1523904f09ffdec--