hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ammar Sahib <ammar.sa...@yahoo.com>
Subject Re: loading vertices into RAM
Date Tue, 21 Jan 2014 14:58:38 GMT
Hi Edward

I tried to run my program with the option of DiskVerticesInfo using a cluster of 5 "virtual"
machines each with 4 GB of RAM. I configured the heap memory to 2048 MB (-Xmx2048m).

I am working with graph consists of 10 million vertices. After a round 3 hours I get the error
of Java heap space. Do you think that using a virtual machines instead of real physical machines
might have something to do with this problem?

The problem that I get:
14/01/21 15:22:23 ERROR bsp.LocalBSPRunner: Exception during BSP execution!
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
        at java.util.concurrent.FutureTask.get(FutureTask.java:111)
        at org.apache.hama.bsp.LocalBSPRunner$ThreadObserver.run(LocalBSPRunner.java:313)
        at java.lang.Thread.run(Thread.java:724)
Caused by: java.lang.OutOfMemoryError: Java heap space



My configuration file content:

<configuration>
<property>
    <name>bsp.master.address</name>
    <value>master</value>
</property>
<property>
    <name>bsp.system.dir</name>
    <value>/tmp/hama-hadoop/bsp/system</value>
</property>
<property>
    <name>bsp.local.dir</name>
    <value>/tmp/hama-hadoop/bsp/local</value>
</property>
<property>
    <name>hama.tmp.dir</name>
    <value>/tmp/hama-hadoop</value>
</property>
<property>
    <name>fs.default.name</name>
    <value>hdfs://master:54310</value>
</property>
<property>
    <name>hama.zookeeper.quorum</name>
    <value>master,slave1,slave2,slave3,slave4</value>
</property>
<property>
<name>bsp.child.java.opts</name>
<value>-Xmx2048m</value>
</property>
</configuration>





On Tuesday, January 21, 2014 2:52 AM, Edward J. Yoon <edwardyoon@apache.org> wrote:
 
To use OffHeapVerticesInfo, you need to add Apache DirectMemory
libraries to lib folder.

or, Try with DiskVerticesInfo.

With trunk version, I was able to run 30 thousand vertices graph on
single machine, and 1B vertices on a full rack cluster (child opt:
-Xmx2048m).


On Tue, Jan 21, 2014 at 1:57 AM, Ammar Sahib <ammar.sahib@yahoo.com> wrote:
> Hi
>
> Thanks for the reply. I am using the HAMA version from the TRUNK and I am running my
own developed algorithm. I am trying to work with a grapg consists of 10 million vertices. 
Did someone experienced working with big graphs (millions of vertices) using HAMA? can you
please share your experience?
>
>
> I am trying now to use:
>
> Conf.setClass("
> hama.graph.vertices.info",org.apache.hama.graph.
> OffHeapVerticesInfo.class,org.apache.hama.graph.VerticesInfo.class);
>
>
 I get the error:
>
>
> 14/01/20 17:42:16 ERROR bsp.LocalBSPRunner: Exception during BSP execution!
> java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: org/apache/directmemory/utils/CacheValuesIterable
>         at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:111)
>         at org.apache.hama.bsp.LocalBSPRunner$ThreadObserver.run(LocalBSPRunner.java:313)
>         at java.lang.Thread.run(Thread.java:724)
> Caused by: java.lang.NoClassDefFoundError: org/apache/directmemory/utils/CacheValuesIterable
>         at
 org.apache.hama.graph.OffHeapVerticesInfo.skippingIterator(OffHeapVerticesInfo.java:112)
>         at org.apache.hama.graph.GraphJobRunner.cleanup(GraphJobRunner.java:163)
>         at org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:262)
>         at org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
>         at org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> The log of my master is as following:
>
> /************************************************************
> STARTUP_MSG: Starting BSPMaster
> STARTUP_MSG:   host = c3-large1-master/10.255.255.2
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 1.2.0
> STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2
-r 1479473; compiled by 'hortonfo' on Mon May  6 18:29:07 UTC 2013
> STARTUP_MSG:   java = 1.7.0_25
> ************************************************************/
> 2014-01-14 21:27:35,808 INFO org.apache.hama.bsp.BSPMaster: RPC BSPMaster: host master
port 40000
> 2014-01-14 21:27:37,200 INFO org.apache.hama.ipc.Server: Starting Socket Reader #1 for
port 40000
> 2014-01-14 21:27:37,732 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log)
via org.mortbay.log.Slf4jLog
> 2014-01-14 21:27:38,147 INFO org.apache.hama.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort()
before open() is -1. Opening the
 listener on 40013
> 2014-01-14 21:27:38,168 INFO org.apache.hama.http.HttpServer: listener.getLocalPort()
returned 40013 webServer.getConnectors()[0].getLocalPort() returned 40013
> 2014-01-14 21:27:38,168 INFO org.apache.hama.http.HttpServer: Jetty bound to port 40013
> 2014-01-14 21:27:38,168 INFO org.mortbay.log: jetty-6.1.14
> 2014-01-14 21:27:38,446 INFO org.mortbay.log: Extract jar:file:/usr/local/hama-0.6.3/hama-core-0.6.3.jar!/webapp/bspmaster/
to /tmp/Jetty_master_40013_bspmaster____ge2lxf/webapp
> 2014-01-14 21:27:40,162 INFO org.mortbay.log: Started SelectChannelConnector@master:40013
> 2014-01-14 21:27:40,734 INFO org.apache.hama.bsp.BSPMaster: Cleaning up the system directory
> 2014-01-14 21:27:40,734
 INFO org.apache.hama.bsp.BSPMaster: hdfs://master:54310/tmp/hama-hadoop/bsp/system
> 2014-01-14 21:27:40,991 INFO org.apache.hama.bsp.sync.ZKSyncBSPMasterClient: Initialized
ZK false
> 2014-01-14 21:27:40,991 INFO org.apache.hama.bsp.sync.ZKSyncClient: Initializing ZK Sync
Client
> 2014-01-14 21:27:41,073 INFO org.apache.hama.ipc.Server: IPC Server Responder: starting
> 2014-01-14 21:27:41,077 INFO org.apache.hama.ipc.Server: IPC Server listener on 40000:
starting
> 2014-01-14 21:27:41,085 INFO org.apache.hama.ipc.Server: IPC Server handler 0 on 40000:
starting
> 2014-01-14 21:27:41,088 INFO org.apache.hama.bsp.BSPMaster: Starting RUNNING
> 2014-01-14 21:27:41,168 INFO org.apache.hama.bsp.BSPMaster: groomd_slave2_50000 is added.
> 2014-01-14 21:27:49,634 INFO org.apache.hama.bsp.BSPMaster:
 groomd_slave1_50000 is added.
> 2014-01-14 21:28:15,943 INFO org.apache.hama.bsp.BSPMaster: groomd_master_50000 is added.
>
>
>
>
>
>
>
> On Sunday, January 19, 2014 7:58 AM, Tommaso Teofili <tommaso.teofili@gmail.com>
wrote:
>
> yes, the correct way of setting OffHeapVI is: conf.setClass("
> hama.graph.vertices.info",org.apache.hama.graph.
> OffHeapVerticesInfo.class,org.apache.hama.graph.VerticesInfo.class);
>
> Apart from that, what Hama version are you running on?
> Looking at the code in trunk it shouldn't be possible to have a NPE on the
> currentVertex if the iterator is consumed correctly, instead if one doesn't
> call hasNext before next and / or calls next even if hasNext returns false
> then it's possible to have that NPE.
> Also what algorithm / example are you running? Any useful information (like
> environment, execution mode, logs, version, etc.) would be useful to help
> you.
>
> Tommaso
>
>
>
>
> 2014/1/19 步青云 <mailliuping@qq.com>
>
>> I got the same problem about loading vertices into RAM.And I try to use
>> off OffHeapVerticesInfo.
>> You may use the
 method setClass like this:
>> conf.setClass("hama.graph.vertices.info
>> ",org.apache.hama.graph.OffHeapVerticesInfo.class,org.apache.hama.graph.VerticesInfo.class);
>> However,I got the Nullexception using OffHeapVerticesInfo.The errors are
>> as follows:
>>
>> 14/01/18 20:54:23 ERROR bsp.LocalBSPRunner: Exception during BSP execution!
>> java.lang.NullPointerException
>>     at
>> org.apache.hama.graph.OffHeapVerticesInfo$1.next(OffHeapVerticesInfo.java:139)
>>     at
>> org.apache.hama.graph.GraphJobRunner.doSuperstep(GraphJobRunner.java:251)
>>     at org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:145)
>>     at
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
>>     at
>> org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:286)
>>
>> Anyone could help me to solve this problem?Thanks a lot.
>>
>>
>>
>>
>> ------------------ Original ------------------
>> From:  "Ammar Sahib"<ammar.sahib@yahoo.com>;
>> Date:  Jan 17, 2014
>> To:  "user@hama.apache.org"<user@hama.apache.org>;
>>
>> Subject:  Re: loading vertices into RAM
>>
>>
>>
>> I think we are getting close now, However now I have runtime exception:
>>
>> Exception in thread "main" java.lang.RuntimeException: interface
>> org.apache.hama.graph.VerticesInfo not
>> org.apache.hama.graph.ListVerticesInfo
>>     at
>> org.apache.hadoop.conf.Configuration.setClass(Configuration.java:858)
>>
>>
>>
>>
>>
>> On Friday, January 17, 2014 2:30 PM, Tommaso Teofili <
>> tommaso.teofili@gmail.com> wrote:
>>
>> ah yes, sorry, you also have to specify the interface, I don't have the
>> code in front of me but it should be :
>>
>> conf.setClass("hama.graph.vertices.info",
>> org.apache.hama.graph.VerticesInfo.class, org.apache.
>> hama.graph.ListVerticesInfo.class);
>>
>> Tommaso
>>
>>
>>
>> 2014/1/17 Ammar Sahib <ammar.sahib@yahoo.com>
>>
>> > Hi
>> >
>> >
 Thanks for your reply. I used now:
>> >
>> > conf.setClass("hama.graph.vertices.info
>> > ",org.apache.hama.graph.ListVerticesInfo.class);
>> >
>> > Now I get this error:
>> > The method setClass(String, Class<?>, Class<?>) in the type Configuration
>> > is not applicable for the arguments (String, Class<ListVerticesInfo>)
>> >
>> > I am using HAMA 0.6.3
>> >
>> >
>> >
>> >
>> >
>> > On Friday, January 17, 2014 12:59 PM, Tommaso Teofili <
>> > tommaso.teofili@gmail.com> wrote:
>> >
>> > you're passing the fully qualified name of the Class as a String to a
>> > method setClass(String, Class) while you should pass the Class itself,
>> > e.g.:
>> > HamaConfiguration conf = new HamaConfiguration();
>> > conf.setClass("hama.graph.vertices.info",org.apache.
>> > hama.graph.ListVerticesInfo.class);
>> >
>> > Hope this helps,
>> > Tommaso
>> >
>> >
>> >
>> >
>> > 2014/1/17 Ammar Sahib <ammar.sahib@yahoo.com>
>> >
>> > > Hi
>> > >
>> > > I am trying to evaluate the different implementation below:
>> > >
>> > >
>> > > - ListVerticesinfo: loads vertices into array list.
>> > > - MapVerticesinfo: loads vertices into tree map.
>> > > - DiskVerticesInfo: loads vertices into a local file.
>> > >
>> > > When using the conf.setClass method I got an error. Below is sample of
>> my
>> > > code:
>> > > HamaConfiguration conf = new HamaConfiguration();
>> > > conf.setClass("hama.graph.vertices.info
>> > > ","org.apache.hama.graph.ListVerticesInfo");
>> > >
>> > > The error I am getting is:
>> > > The method setClass(String, Class<?>, Class<?>) in the type
>> Configuration
>> > > is not applicable for the arguments (String, String).
>> > >
>> > > However I found that I can use conf.set method.
>> > >
>> > >
>> > > Can someone tell me what is I am doing wrong?
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Wednesday, January 15, 2014 8:01 AM, Tommaso Teofili <
>> > > tommaso.teofili@gmail.com> wrote:
>> > >
>> > > and OffHeapVerticesInfo for loading vertices off heap, which is
>> available
>> > > with 0.6.3 as well if I recall correctly.
>> > > Tommaso
>> > >
>> > >
>> > >
>> > > 2014/1/15 Edward J. Yoon <edwardyoon@apache.org>
>> > >
>> > > > There are few implementations.
>> > > >
>> > > >  - ListVerticesinfo:
 loads vertices into array list.
>> > > >  - MapVerticesinfo: loads vertices into tree map.
>> > > >  - DiskVerticesInfo: loads vertices into a local file.
>> > > >
>> > > > You can choose one of them by setting the "hama.graph.vertices.info"
>> > > > in job configuration.
>> > > >
>> > > >   > conf.setClass("hama.graph.vertices.info",
>> > > > "org.apache.hama.graph.ListVerticesInfo".
>> > > >
>> > > > With the latest 0.6.3 version, you can use only ListVerticesInfo.
>> > > > Please use the TRUNK.
>> > > >
>> > > >
>>
 > > > On Tue, Jan 14, 2014 at 11:18 PM, Ammar Sahib <ammar.sahib@yahoo.com
>> >
>> > > > wrote:
>> > > > > Hi
>> > > > >
>> > > > > According to the BSP model, the data is processed in the RAM
and
>> that
>> > > is
>> > > > the reason why Pregel model is faster than the MapReduce (MapReduce
>> > > > writedown to disk). Can someone explains to me how to be sure that
>> all
>> > > the
>> > > > graph vertices are actually been loaded in RAM?
>> > > > >
>> > > > >
>> > > > > How would HAMA behave if the vertices values are so big such
that
>> the
>> > > > available RAM memory is not enough to contains all of the vertices?
>> > > > >
>> > > > > Regards
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Best Regards, Edward J. Yoon
>> > > > @eddieyoon

>> > > >
>> > >
>> >
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message