hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: hama processes
Date Sun, 13 Sep 2015 23:19:55 GMT
Thanks, have nice holidays. :-)

On Sat, Sep 12, 2015 at 7:04 PM, Behroz Sikander <behroz89@gmail.com> wrote:
> Hi,
> I also added the swap space and the algorithm ran a bit more but after some
> time I faced the same problem again. I am on holidays and will figure out
> the issue in next few days. I will keep you guys updated.
>
> Regards,
> Behroz
>
>
>
> On Fri, Sep 11, 2015 at 6:36 AM, Edward J. Yoon <edwardyoon@apache.org>
> wrote:
>
>> Hi, I think you have to adding some swap space. Did you figure out
>> what's problem?
>>
>> On Fri, Sep 4, 2015 at 8:20 AM, Behroz Sikander <behroz89@gmail.com>
>> wrote:
>> > More info on this:
>> > I noticed that only 2 machines were failing with OutOfMemory. After
>> messing
>> > around, I found out that the swap memory was 0 for these 2 machines but
>> > others had swap space of 1 GB. I added the swap to these machines and it
>> > worked. But as expected in the next run of algorithm with more data it
>> > crashed again. This time GroomChildProcess crashed with the following log
>> > message
>> >
>> >
>> > *OpenJDK 64-Bit Server VM warning: INFO:
>> > os::commit_memory(0x00000007fa100000, 42467328, 0) failed; error='Cannot
>> > allocate memory' (errno=12)*
>> > *#*
>> > *# There is insufficient memory for the Java Runtime Environment to
>> > continue.*
>> > *# Native memory allocation (malloc) failed to allocate 42467328 bytes
>> for
>> > committing reserved memory.*
>> > *# An error report file with more information is saved as:*
>> > *#
>> >
>> /home/behroz/Documents/Packages/tmp_data/hama_tmp/bsp/local/groomServer/attempt_201509040050_0004_000006_0/work/hs_err_pid28850.log*
>> >
>> > My slave machines have 8GB of RAM, 4 CPUs, 20 GB harddrive and 1GB swap.
>> I
>> > run 3 groom child process each taking 2GB of RAM. Apart from
>> > GroomChildProcess, I have GroomServer, DataNode and TaskManager running
>> on
>> > the slave. After assigning 2GB ram to 3 child groom process (total 6GB
>> > RAM), only 2 GB of RAM is left for others. Do you think this is the
>> problem
>> > ?
>> >
>> > Regards,
>> > Behroz
>> >
>> > On Thu, Sep 3, 2015 at 11:39 PM, Behroz Sikander <behroz89@gmail.com>
>> wrote:
>> >
>> >> Ok I found a strange thing. In my hadoop folder, I found a new file
>> named
>> >> "hs_err_pid4919.log" inside the $HADOOP_HOME directory.
>> >>
>> >> The content of the file are
>> >>
>> >> *#   Increase physical memory or swap space*
>> >> *#   Check if swap backing store is full*
>> >> *#   Use 64 bit Java on a 64 bit OS*
>> >> *#   Decrease Java heap size (-Xmx/-Xms)*
>> >> *#   Decrease number of Java threads*
>> >> *#   Decrease Java thread stack sizes (-Xss)*
>> >> *#   Set larger code cache with -XX:ReservedCodeCacheSize=*
>> >> *# This output file may be truncated or incomplete.*
>> >> *#*
>> >> *#  Out of Memory Error (os_linux.cpp:2809), pid=4919,
>> tid=140564483778304*
>> >> *#*
>> >> *# JRE version: OpenJDK Runtime Environment (7.0_79-b14) (build
>> >> 1.7.0_79-b14)*
>> >> *# Java VM: OpenJDK 64-Bit Server VM (24.79-b02 mixed mode linux-amd64
>> >> compressed oops)*
>> >> *# Derivative: IcedTea 2.5.6*
>> >> *# Distribution: Ubuntu 14.04 LTS, package 7u79-2.5.6-0ubuntu1.14.04.1*
>> >> *# Failed to write core dump. Core dumps have been disabled. To enable
>> >> core dumping, try "ulimit -c unlimited" before starting Java again*
>> >> *#*
>> >>
>> >> *---------------  T H R E A D  ---------------*
>> >>
>> >> *Current thread (0x00007fd7c0438800):  JavaThread "PacketResponder:
>> >> BP-1786576942-141.40.254.14-1441293753577:blk_1074136820_396012,
>> >> type=HAS_DOWNSTREAM_IN_PIPELINE" daemon [_thread_new, id=11943,
>> >> stack(0x00007fd7b80fa000,0x00007fd7b81fb000)]*
>> >>
>> >> *Stack: [0x00007fd7b80fa000,0x00007fd7b81fb000],  sp=0x00007fd7b81f9be0,
>> >>  free space=1022k*
>> >> *Native frames: (J=compiled Java code, j=interpreted, Vv=VM code,
>> C=native
>> >> code)*
>> >>
>> >> I think my DataNode process is crashing. I now know that it is a out of
>> >> memory error but the reason is not sure.
>> >>
>> >> On Thu, Sep 3, 2015 at 10:25 PM, Behroz Sikander <behroz89@gmail.com>
>> >> wrote:
>> >>
>> >>> ok. HA = High Availability ?
>> >>>
>> >>> I am also trying to solve the following problem. But I do not
>> understand
>> >>> why I get the exception because my algorithm does not have a lot of
>> data
>> >>> that is being sent to master.
>> >>> *'BSP task process exit with nonzero status of 1'*
>> >>>
>> >>> Each slave node processes some data and sends back a Double array of
>> size
>> >>> 96 to the master machine. Recently, I was testing the algorithm on 8000
>> >>> files when it crashed. This means that 8000 double arrays of size 96
>> are
>> >>> sent to the master to process. Once master receives all the data, it
>> gets
>> >>> out of sync and starts the processing again. Here is the calculation
>> >>>
>> >>> 8000 * 96 * 8 (size of float) = 6144000 = ~6.144 MB.
>> >>>
>> >>> I am not sure but this does not seem to be alot of data and I think
>> >>> message manager that you mentioned should be able to handle it.
>> >>>
>> >>> Regards,
>> >>> Behroz
>> >>>
>> >>> On Tue, Sep 1, 2015 at 1:07 PM, Edward J. Yoon <edwardyoon@apache.org>
>> >>> wrote:
>> >>>
>> >>>> I'm reading GroomServer code and its taskMonitorService. It seems
>> >>>> related with cluster HA.
>> >>>>
>> >>>> On Sat, Aug 29, 2015 at 1:16 PM, Edward J. Yoon <
>> edwardyoon@apache.org>
>> >>>> wrote:
>> >>>> >> If my Groom Child Process fails for some reason, the processes
are
>> >>>> not killed automatically
>> >>>> >
>> >>>> > I also experienced this problem before. I guess, if one of
processes
>> >>>> > crashed with OutOfMemory, other processes infinitely waiting
for it.
>> >>>> > This is a bug.
>> >>>> >
>> >>>> > On Sat, Aug 29, 2015 at 1:02 AM, Behroz Sikander <
>> behroz89@gmail.com>
>> >>>> wrote:
>> >>>> >> Just another quick question. If my Groom Child Process
fails for
>> some
>> >>>> >> reason, the processes are not killed automatically. If
i run JPS
>> >>>> command, I
>> >>>> >> can still see something like "3791 GroomServer$BSPPeerChild".
Is
>> this
>> >>>> the
>> >>>> >> expected behavior ?
>> >>>> >>
>> >>>> >> I am using latest hama version (0.7.0).
>> >>>> >> Regards,
>> >>>> >> Behroz
>> >>>> >>
>> >>>> >> On Fri, Aug 28, 2015 at 4:12 PM, Behroz Sikander <
>> behroz89@gmail.com>
>> >>>> wrote:
>> >>>> >>
>> >>>> >>> Ok I will try it out.
>> >>>> >>>
>> >>>> >>> No, actually I am learning alot by facing these problems.
It is
>> >>>> actually a
>> >>>> >>> good thing :D
>> >>>> >>>
>> >>>> >>> Regards,
>> >>>> >>> Behroz
>> >>>> >>>
>> >>>> >>> On Fri, Aug 28, 2015 at 5:52 AM, Edward J. Yoon <
>> >>>> edwardyoon@apache.org>
>> >>>> >>> wrote:
>> >>>> >>>
>> >>>> >>>> > message managers. Hmmm, I will recheck my
logic related to
>> >>>> messages. Btw
>> >>>> >>>>
>> >>>> >>>> Serialization (like GraphJobMessage) is good idea.
It stores
>> >>>> multiple
>> >>>> >>>> messages in serialized form in a single object
to reduce the
>> memory
>> >>>> >>>> usage and RPC overhead.
>> >>>> >>>>
>> >>>> >>>> > what is the limit of these message managers
? How much data at
>> a
>> >>>> single
>> >>>> >>>> > time they can handle ?
>> >>>> >>>>
>> >>>> >>>> It depends on memory.
>> >>>> >>>>
>> >>>> >>>> > P.S. Each day, as I am moving towards a big
cluster I am
>> running
>> >>>> into
>> >>>> >>>> > problems (alot of them :D).
>> >>>> >>>>
>> >>>> >>>> Haha, sorry for inconvenient and thanks for your
reports.
>> >>>> >>>>
>> >>>> >>>> On Fri, Aug 28, 2015 at 11:25 AM, Behroz Sikander
<
>> >>>> behroz89@gmail.com>
>> >>>> >>>> wrote:
>> >>>> >>>> > Ok. So, I do have a memory problem. I will
try to scale out.
>> >>>> >>>> >
>> >>>> >>>> > *>>Each task processor has two message
manager, one for
>> outgoing
>> >>>> and
>> >>>> >>>> one*
>> >>>> >>>> >
>> >>>> >>>> > *for incoming. All these are handled in memory,
so it
>> >>>> sometimesrequires
>> >>>> >>>> > large memory space.*
>> >>>> >>>> > So, you mean that before barrier synchronization,
I have alot
>> of
>> >>>> data in
>> >>>> >>>> > message managers. Hmmm, I will recheck my
logic related to
>> >>>> messages. Btw
>> >>>> >>>> > what is the limit of these message managers
? How much data at
>> a
>> >>>> single
>> >>>> >>>> > time they can handle ?
>> >>>> >>>> >
>> >>>> >>>> > P.S. Each day, as I am moving towards a big
cluster I am
>> running
>> >>>> into
>> >>>> >>>> > problems (alot of them :D).
>> >>>> >>>> >
>> >>>> >>>> > Regards,
>> >>>> >>>> > Behroz Sikander
>> >>>> >>>> >
>> >>>> >>>> > On Fri, Aug 28, 2015 at 4:04 AM, Edward J.
Yoon <
>> >>>> edwardyoon@apache.org>
>> >>>> >>>> > wrote:
>> >>>> >>>> >
>> >>>> >>>> >> > for 3 Groom child process + 2GB for
Ubuntu OS). Is this
>> correct
>> >>>> >>>> >> > understanding ?
>> >>>> >>>> >>
>> >>>> >>>> >> and,
>> >>>> >>>> >>
>> >>>> >>>> >> > on a big dataset. I think these exceptions
have something to
>> >>>> do with
>> >>>> >>>> >> Ubuntu
>> >>>> >>>> >> > OS killing the hama process due to
lack of memory. So, I was
>> >>>> curious
>> >>>> >>>> >> about
>> >>>> >>>> >>
>> >>>> >>>> >> Yes, you're right.
>> >>>> >>>> >>
>> >>>> >>>> >> Each task processor has two message manager,
one for outgoing
>> >>>> and one
>> >>>> >>>> >> for incoming. All these are handled in
memory, so it sometimes
>> >>>> >>>> >> requires large memory space. To solve
the OutOfMemory issue,
>> you
>> >>>> >>>> >> should scale-out your cluster by increasing
the number of
>> nodes
>> >>>> and
>> >>>> >>>> >> job tasks, or optimize your algorithm.
Another option is
>> >>>> >>>> >> disk-spillable message manager. This is
not supported yet.
>> >>>> >>>> >>
>> >>>> >>>> >> On Fri, Aug 28, 2015 at 10:45 AM, Behroz
Sikander <
>> >>>> behroz89@gmail.com>
>> >>>> >>>> >> wrote:
>> >>>> >>>> >> > Hi,
>> >>>> >>>> >> > Yes. According to hama-default.xml,
each machine will open 3
>> >>>> process
>> >>>> >>>> with
>> >>>> >>>> >> > 2GB memory each. This means that
my VMs need atleast 8GB
>> >>>> memory (2GB
>> >>>> >>>> each
>> >>>> >>>> >> > for 3 Groom child process + 2GB for
Ubuntu OS). Is this
>> correct
>> >>>> >>>> >> > understanding ?
>> >>>> >>>> >> >
>> >>>> >>>> >> > I recently ran into the following
exceptions when I was
>> trying
>> >>>> to run
>> >>>> >>>> >> hama
>> >>>> >>>> >> > on a big dataset. I think these exceptions
have something to
>> >>>> do with
>> >>>> >>>> >> Ubuntu
>> >>>> >>>> >> > OS killing the hama process due to
lack of memory. So, I was
>> >>>> curious
>> >>>> >>>> >> about
>> >>>> >>>> >> > my configurations.
>> >>>> >>>> >> > 'BSP task process exit with nonzero
status of 137.'
>> >>>> >>>> >> > 'BSP task process exit with nonzero
status of 1'
>> >>>> >>>> >> >
>> >>>> >>>> >> >
>> >>>> >>>> >> >
>> >>>> >>>> >> > Regards,
>> >>>> >>>> >> > Behroz
>> >>>> >>>> >> >
>> >>>> >>>> >> > On Fri, Aug 28, 2015 at 3:04 AM,
Edward J. Yoon <
>> >>>> >>>> edwardyoon@apache.org>
>> >>>> >>>> >> > wrote:
>> >>>> >>>> >> >
>> >>>> >>>> >> >> Hi,
>> >>>> >>>> >> >>
>> >>>> >>>> >> >> You can change the max tasks
per node by setting below
>> >>>> property in
>> >>>> >>>> >> >> hama-site.xml. :-)
>> >>>> >>>> >> >>
>> >>>> >>>> >> >>   <property>
>> >>>> >>>> >> >>     <name>bsp.tasks.maximum</name>
>> >>>> >>>> >> >>     <value>3</value>
>> >>>> >>>> >> >>     <description>The maximum
number of BSP tasks that will
>> be
>> >>>> run
>> >>>> >>>> >> >> simultaneously
>> >>>> >>>> >> >>     by a groom server.</description>
>> >>>> >>>> >> >>   </property>
>> >>>> >>>> >> >>
>> >>>> >>>> >> >>
>> >>>> >>>> >> >> On Fri, Aug 28, 2015 at 5:18
AM, Behroz Sikander <
>> >>>> >>>> behroz89@gmail.com>
>> >>>> >>>> >> >> wrote:
>> >>>> >>>> >> >> > Hi,
>> >>>> >>>> >> >> > Recently, I noticed that
my hama deployment is only
>> opening
>> >>>> 3
>> >>>> >>>> >> processes
>> >>>> >>>> >> >> per
>> >>>> >>>> >> >> > machine. This is because
of the configuration settings in
>> >>>> the
>> >>>> >>>> default
>> >>>> >>>> >> >> hama
>> >>>> >>>> >> >> > file.
>> >>>> >>>> >> >> >
>> >>>> >>>> >> >> > My questions is why 3 and
why not 5 or 7 ? What
>> criteria's
>> >>>> should
>> >>>> >>>> be
>> >>>> >>>> >> >> > considered if I want to
increase the value ?
>> >>>> >>>> >> >> >
>> >>>> >>>> >> >> > Regards,
>> >>>> >>>> >> >> > Behroz
>> >>>> >>>> >> >>
>> >>>> >>>> >> >>
>> >>>> >>>> >> >>
>> >>>> >>>> >> >> --
>> >>>> >>>> >> >> Best Regards, Edward J. Yoon
>> >>>> >>>> >> >>
>> >>>> >>>> >>
>> >>>> >>>> >>
>> >>>> >>>> >>
>> >>>> >>>> >> --
>> >>>> >>>> >> Best Regards, Edward J. Yoon
>> >>>> >>>> >>
>> >>>> >>>>
>> >>>> >>>>
>> >>>> >>>>
>> >>>> >>>> --
>> >>>> >>>> Best Regards, Edward J. Yoon
>> >>>> >>>>
>> >>>> >>>
>> >>>> >>>
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> > --
>> >>>> > Best Regards, Edward J. Yoon
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Best Regards, Edward J. Yoon
>> >>>>
>> >>>
>> >>>
>> >>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>>



-- 
Best Regards, Edward J. Yoon

Mime
View raw message