giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Stipkovic <sebastian.stipko...@gmail.com>
Subject Re: out of core option
Date Thu, 05 Dec 2013 22:49:25 GMT
Hi Ameya,

thanks for the answer. My allocated memory was too high. My server has
altogether 4000M. I have turned the memory down to 2000M for each Mapper.

Now I have set both out of core options and get the following exception:

 2013-12-05 23:10:18,568 INFO org.apache.hadoop.mapred.JobTracker: Adding
task (MAP) 'attempt_201312052304_0001_m_000001_0' to tip
task_201312052304_0001_m_000001, for tracker 'tracker_hduser:localhost/
127.0.0.1:39793' 2013-12-05 23:10:27,645 INFO
org.apache.hadoop.mapred.TaskInProgress: Error from
attempt_201312052304_0001_m_000001_0: java.lang.IllegalStateException: run:
Caught an unrecoverable exception waitFor: ExecutionException occurred
while waiting for
org.apache.giraph.utils.ProgressableUtils$FutureWaitable@62bf5822 at
org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101) at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at
org.apache.hadoop.mapred.Child$4.run(Child.java:259) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253) Caused by:
java.lang.IllegalStateException: waitFor: ExecutionException occurred while
waiting for
org.apache.giraph.utils.ProgressableUtils$FutureWaitable@62bf5822 at
org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:181)
at
org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:139)
at
org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:124)
at
org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:87)
at
org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:221)
at
org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:281)
at
org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:325)
at
org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:506)
at
org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:244)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:91) ... 7 more
Caused by: java.util.concurrent.ExecutionException:
java.lang.IllegalStateException: getOrCreatePartition: cannot retrieve
partition 0 at
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:262) at
java.util.concurrent.FutureTask.get(FutureTask.java:119) at
org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:300)
at
org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:173)
... 16 more Caused by: java.lang.IllegalStateException:
getOrCreatePartition: cannot retrieve partition 0 at
org.apache.giraph.partition.DiskBackedPartitionStore.getOrCreatePartition(DiskBackedPartitionStore.java:243)
at
org.apache.giraph.comm.requests.SendWorkerVerticesRequest.doRequest(SendWorkerVerticesRequest.java:110)
at
org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:482)
at
org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendVertexRequest(NettyWorkerClientRequestProcessor.java:276)
at
org.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(VertexInputSplitsCallable.java:172)
at
org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:267)
at
org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:211)
at
org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:60)
at
org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at
java.util.concurrent.FutureTask.run(FutureTask.java:166) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724) Caused by:
java.util.concurrent.ExecutionException: java.lang.NullPointerException at
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) at
java.util.concurrent.FutureTask.get(FutureTask.java:111) at
org.apache.giraph.partition.DiskBackedPartitionStore.getOrCreatePartition(DiskBackedPartitionStore.java:228)
... 13 more Caused by: java.lang.NullPointerException at
org.apache.giraph.partition.DiskBackedPartitionStore$GetPartition.call(DiskBackedPartitionStore.java:692)
at
org.apache.giraph.partition.DiskBackedPartitionStore$GetPartition.call(DiskBackedPartitionStore.java:658)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at
java.util.concurrent.FutureTask.run(FutureTask.java:166) at
org.apache.giraph.partition.DiskBackedPartitionStore$DirectExecutorService.execute(DiskBackedPartitionStore.java:972)
at
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:132)
... 14 more


Thanks,
Sebastian


2013/12/5 Ameya Vilankar <ameya.vilankar@gmail.com>

> Each worker is allocated *mapred.child.java.opts *memory, which in your
> case is 4000M. Check if your server doesn't have enough memory for 2
> Mappers. Also the out of memory option is available in two forms.
> 1. Out of core graph
> 2. Out of core messages.
>
> Currently you are setting only the out of core graph and not the out of
> core messages. Enable both of them. More information about options can be
> found here: http://giraph.apache.org/options.html
> set -D giraph.useOutOfCoreGraph=true -D giraph.useOutOfCoreMessages=true
> while passing options to GiraphRunner.
>
> Thanks,
> Ameya
>
>
> On Thu, Dec 5, 2013 at 12:39 PM, Sebastian Stipkovic <
> sebastian.stipkovic@gmail.com> wrote:
>
>> Hello,
>>
>> I had setup giraph 1.1.0 with hadoop-0.20.203.0rc1  on a single
>> node cluster. It computes a tiny graph successful. But if the
>> input graph is huge (5 GB), I get an OutOfMemory(Garbage Collector)
>> exception, although I had turned on the out-of-memory-option. The job
>> with out-of-memory-option works only well with a tiny graph (0.9 GB).  What
>> is Wrong? Does I have to do furthermore configurations?
>>
>> My Configurations are as follows:
>>
>>
>> namevalue  *fs.s3n.impl*org.apache.hadoop.fs.s3native.NativeS3FileSystem
>> *mapred.task.cache.levels*2  *giraph.vertexOutputFormatClass*
>> org.apache.giraph.examples.MyShortestPaths$MyOutputFormat
>> *hadoop.tmp.dir*/app/hadoop/tmp  *hadoop.native.lib*true
>> *map.sort.class*org.apache.hadoop.util.QuickSort
>> *dfs.namenode.decommission.nodes.per.interval*5
>> *dfs.https.need.client.auth*false  *ipc.client.idlethreshold*4000
>> *dfs.datanode.data.dir.perm*755  *mapred.system.dir*
>> ${hadoop.tmp.dir}/mapred/system
>> *mapred.job.tracker.persist.jobstatus.hours*0  *dfs.datanode.address*
>> 0.0.0.0:50010  *dfs.namenode.logging.level*info
>> *dfs.block.access.token.enable*false  *io.skip.checksum.errors*false  *fs.default.name
>> <http://fs.default.name>*hdfs://localhost:54310
>> *mapred.cluster.reduce.memory.mb*-1  *mapred.child.tmp*./tmp
>> *fs.har.impl.disable.cache*true  *dfs.safemode.threshold.pct*0.999f
>> *mapred.skip.reduce.max.skip.groups*0  *dfs.namenode.handler.count*10
>> *dfs.blockreport.initialDelay*0  *mapred.heartbeats.in.second*100
>> *mapred.tasktracker.dns.nameserver*default  *io.sort.factor*10
>> *mapred.task.timeout*600000  *giraph.maxWorkers*1
>> *mapred.max.tracker.failures*4  *hadoop.rpc.socket.factory.class.default*
>> org.apache.hadoop.net.StandardSocketFactory
>> *mapred.job.tracker.jobhistory.lru.cache.size*5  *fs.hdfs.impl*
>> org.apache.hadoop.hdfs.DistributedFileSystem
>> *mapred.queue.default.acl-administer-jobs**
>> *dfs.block.access.key.update.interval*600
>> *mapred.skip.map.auto.incr.proc.count*true
>> *mapreduce.job.complete.cancel.delegation.tokens*true
>> *io.mapfile.bloom.size*1048576
>> *mapreduce.reduce.shuffle.connect.timeout*180000
>> *dfs.safemode.extension*30000
>> *mapred.jobtracker.blacklist.fault-timeout-window*180
>> *tasktracker.http.threads*40  *mapred.job.shuffle.merge.percent*0.66
>> *mapreduce.inputformat.class*org.apache.giraph.bsp.BspInputFormat
>> *fs.ftp.impl*org.apache.hadoop.fs.ftp.FTPFileSystem  *user.name
>> <http://user.name>*hduser  *mapred.output.compress*false
>> *io.bytes.per.checksum*512  *giraph.isStaticGraph*true
>> *mapred.healthChecker.script.timeout*600000
>> *topology.node.switch.mapping.impl*
>> org.apache.hadoop.net.ScriptBasedMapping
>> *dfs.https.server.keystore.resource*ssl-server.xml
>> *mapred.reduce.slowstart.completed.maps*0.05
>> *mapred.reduce.max.attempts*4  *fs.ramfs.impl*
>> org.apache.hadoop.fs.InMemoryFileSystem
>> *dfs.block.access.token.lifetime*600  *dfs.name.edits.dir*${dfs.name.dir}
>> *mapred.skip.map.max.skip.records*0  *mapred.cluster.map.memory.mb*-1
>> *hadoop.security.group.mapping*
>> org.apache.hadoop.security.ShellBasedUnixGroupsMapping
>> *mapred.job.tracker.persist.jobstatus.dir*/jobtracker/jobsInfo
>> *mapred.jar*hdfs://localhost:54310
>> /app/hadoop/tmp/mapred/staging/hduser/.staging/job_201312051827_0001/job.jar
>> *dfs.block.size*67108864  *fs.s3.buffer.dir*${hadoop.tmp.dir}/s3
>> *job.end.retry.attempts*0  *fs.file.impl*
>> org.apache.hadoop.fs.LocalFileSystem  *mapred.local.dir.minspacestart*0
>> *mapred.output.compression.type*RECORD  *dfs.datanode.ipc.address*
>> 0.0.0.0:50020  *dfs.permissions*true  *topology.script.number.args*100
>> *io.mapfile.bloom.error.rate*0.005  *mapred.cluster.max.reduce.memory.mb*
>> -1  *mapred.max.tracker.blacklists*4  *mapred.task.profile.maps*0-2
>> *dfs.datanode.https.address*0.0.0.0:50475  *mapred.userlog.retain.hours*
>> 24  *dfs.secondary.http.address*0.0.0.0:50090  *dfs.replication.max*512
>> *mapred.job.tracker.persist.jobstatus.active*false
>> *hadoop.security.authorization*false  *local.cache.size*10737418240
>> *dfs.namenode.delegation.token.renew-interval*86400000
>> *mapred.min.split.size*0  *mapred.map.tasks*2  *mapred.child.java.opts*
>> -Xmx4000m  *mapreduce.job.counters.limit*120
>> *dfs.https.client.keystore.resource*ssl-client.xml  *mapred.job.queue.name
>> <http://mapred.job.queue.name>*default  *dfs.https.address*0.0.0.0:50470
>> *mapred.job.tracker.retiredjobs.cache.size*1000
>> *dfs.balance.bandwidthPerSec*1048576  *ipc.server.listen.queue.size*128
>> *mapred.inmem.merge.threshold*1000  *job.end.retry.interval*30000
>> *mapred.skip.attempts.to.start.skipping*2  *fs.checkpoint.dir*
>> ${hadoop.tmp.dir}/dfs/namesecondary  *mapred.reduce.tasks*0
>> *mapred.merge.recordsBeforeProgress*10000  *mapred.userlog.limit.kb*0
>> *mapred.job.reduce.memory.mb*-1  *dfs.max.objects*0
>> *webinterface.private.actions*false  *io.sort.spill.percent*0.80
>> *mapred.job.shuffle.input.buffer.percent*0.70  *mapred.job.name
>> <http://mapred.job.name>*Giraph:
>> org.apache.giraph.examples.MyShortestPaths  *dfs.datanode.dns.nameserver*
>> default  *mapred.map.tasks.speculative.execution*false
>> *hadoop.util.hash.type*murmur  *dfs.blockreport.intervalMsec*3600000
>> *mapred.map.max.attempts*0  *mapreduce.job.acl-view-job*
>>  *dfs.client.block.write.retries*3  *mapred.job.tracker.handler.count*10
>> *mapreduce.reduce.shuffle.read.timeout*180000
>> *mapred.tasktracker.expiry.interval*600000  *dfs.https.enable*false
>> *mapred.jobtracker.maxtasks.per.job*-1
>> *mapred.jobtracker.job.history.block.size*3145728
>> *giraph.useOutOfCoreGiraph*true  *keep.failed.task.files*false
>> *mapreduce.outputformat.class*org.apache.giraph.bsp.BspOutputFormat
>> *dfs.datanode.failed.volumes.tolerated*0  *ipc.client.tcpnodelay*false
>> *mapred.task.profile.reduces*0-2  *mapred.output.compression.codec*
>> org.apache.hadoop.io.compress.DefaultCodec  *io.map.index.skip*0
>> *mapred.working.dir*hdfs://localhost:54310/user/hduser
>> *ipc.server.tcpnodelay*false
>> *mapred.jobtracker.blacklist.fault-bucket-width*15
>> *dfs.namenode.delegation.key.update-interval*86400000
>> *mapred.used.genericoptionsparser*true  *mapred.mapper.new-api*true
>> *mapred.job.map.memory.mb*-1  *giraph.vertex.input.dir*hdfs://localhost:
>> 54310/user/hduser/output  *dfs.default.chunk.view.size*32768
>> *hadoop.logfile.size*10000000
>> *mapred.reduce.tasks.speculative.execution*true  *mapreduce.job.dir*
>> hdfs://localhost:54310
>> /app/hadoop/tmp/mapred/staging/hduser/.staging/job_201312051827_0001
>> *mapreduce.tasktracker.outofband.heartbeat*false
>> *mapreduce.reduce.input.limit*-1  *dfs.datanode.du.reserved*0
>> *hadoop.security.authentication*simple  *fs.checkpoint.period*3600
>> *dfs.web.ugi*webuser,webgroup  *mapred.job.reuse.jvm.num.tasks*1
>> *mapred.jobtracker.completeuserjobs.maximum*100  *dfs.df.interval*60000
>> *dfs.data.dir*${hadoop.tmp.dir}/dfs/data
>> *mapred.task.tracker.task-controller*
>> org.apache.hadoop.mapred.DefaultTaskController  *giraph.minWorkers*1
>> *fs.s3.maxRetries*4  *dfs.datanode.dns.interface*default
>> *mapred.cluster.max.map.memory.mb*-1  *dfs.support.append*false
>> *mapreduce.job.acl-modify-job*
>>  *dfs.permissions.supergroup*supergroup  *mapred.local.dir*
>> ${hadoop.tmp.dir}/mapred/local  *fs.hftp.impl*
>> org.apache.hadoop.hdfs.HftpFileSystem  *fs.trash.interval*0
>> *fs.s3.sleepTimeSeconds*10  *dfs.replication.min*1
>> *mapred.submit.replication*10  *fs.har.impl*
>> org.apache.hadoop.fs.HarFileSystem  *mapred.map.output.compression.codec*
>> org.apache.hadoop.io.compress.DefaultCodec
>> *mapred.tasktracker.dns.interface*default
>> *dfs.namenode.decommission.interval*30  *dfs.http.address*0.0.0.0:50070
>> *dfs.heartbeat.interval*3  *mapred.job.tracker*localhost:54311
>> *mapreduce.job.submithost*hduser  *io.seqfile.sorter.recordlimit*1000000
>> *giraph.vertexInputFormatClass*
>> org.apache.giraph.examples.MyShortestPaths$MyInputFormat  *dfs.name.dir*
>> ${hadoop.tmp.dir}/dfs/name  *mapred.line.input.format.linespermap*1
>> *mapred.jobtracker.taskScheduler*
>> org.apache.hadoop.mapred.JobQueueTaskScheduler
>> *dfs.datanode.http.address*0.0.0.0:50075  *mapred.local.dir.minspacekill*
>> 0  *dfs.replication.interval*3  *io.sort.record.percent*0.05
>> *fs.kfs.impl*org.apache.hadoop.fs.kfs.KosmosFileSystem  *mapred.temp.dir*
>> ${hadoop.tmp.dir}/mapred/temp  *mapred.tasktracker.reduce.tasks.maximum*2
>> *mapreduce.job.user.classpath.first*true  *dfs.replication*1
>> *fs.checkpoint.edits.dir*${fs.checkpoint.dir}  *giraph.computationClass*
>> org.apache.giraph.examples.MyShortestPaths
>> *mapred.tasktracker.tasks.sleeptime-before-sigkill*5000
>> *mapred.job.reduce.input.buffer.percent*0.0
>> *mapred.tasktracker.indexcache.mb*10
>> *mapreduce.job.split.metainfo.maxsize*10000000  *hadoop.logfile.count*10
>> *mapred.skip.reduce.auto.incr.proc.count*true
>> *mapreduce.job.submithostaddress*127.0.1.1
>> *io.seqfile.compress.blocksize*1000000  *fs.s3.block.size*67108864
>> *mapred.tasktracker.taskmemorymanager.monitoring-interval*5000
>> *giraph.minPercentResponded*100.0  *mapred.queue.default.state*RUNNING
>> *mapred.acls.enabled*false  *mapreduce.jobtracker.staging.root.dir*
>> ${hadoop.tmp.dir}/mapred/staging  *mapred.queue.names*default
>> *dfs.access.time.precision*3600000  *fs.hsftp.impl*
>> org.apache.hadoop.hdfs.HsftpFileSystem
>> *mapred.task.tracker.http.address*0.0.0.0:50060
>> *mapred.reduce.parallel.copies*5  *io.seqfile.lazydecompress*true
>> *mapred.output.dir*/user/hduser/output/shortestpaths  *io.sort.mb*100
>> *ipc.client.connection.maxidletime*10000  *mapred.compress.map.output*
>> false  *hadoop.security.uid.cache.secs*14400
>> *mapred.task.tracker.report.address*127.0.0.1:0
>> *mapred.healthChecker.interval*60000  *ipc.client.kill.max*10
>> *ipc.client.connect.max.retries*10  *ipc.ping.interval*300000
>> *mapreduce.user.classpath.first*true  *mapreduce.map.class*
>> org.apache.giraph.graph.GraphMapper  *fs.s3.impl*
>> org.apache.hadoop.fs.s3.S3FileSystem  *mapred.user.jobconf.limit*5242880
>> *mapred.job.tracker.http.address*0.0.0.0:50030  *io.file.buffer.size*4096
>> *mapred.jobtracker.restart.recover*false  *io.serializations*
>> org.apache.hadoop.io.serializer.WritableSerialization
>> *dfs.datanode.handler.count*3  *mapred.reduce.copy.backoff*300
>> *mapred.task.profile*false  *dfs.replication.considerLoad*true
>> *jobclient.output.filter*FAILED
>> *dfs.namenode.delegation.token.max-lifetime*604800000
>> *mapred.tasktracker.map.tasks.maximum*4  *io.compression.codecs*
>> org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec
>> *fs.checkpoint.size*67108864
>>
>> Additionally, if I have more than one worker I get an Exception, too? Are
>> my configurations wrong?
>>
>>
>> best regards,
>> Sebastian
>>
>
>

Mime
View raw message