hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: question about cpu utilization
Date Fri, 10 May 2013 15:04:50 GMT
The CPU scheduling is still kind of fuzzy.  Your request is done in
virtual cores, which do not necessarily correspond to actual physical
cores.  In some cases linux cgroups may be used to guarantee that you will
get at least a certain level of CPU time, but nothing I am aware of right
now will actually bind the process to a given core.

--Bobby

On 5/8/13 11:55 PM, "牛兆捷" <nzjemail@gmail.com> wrote:

>btw,if I set the container cpu to less than 1, what will be? Can many
>container will share one core?
>
>
>2013/5/9 Robert Evans <evans@yahoo-inc.com>
>
>> The I am really not sure what is happening.  Try profiling your task.
>>
>> --Bobby
>>
>> On 5/8/13 11:48 AM, "牛兆捷" <nzjemail@gmail.com> wrote:
>>
>> >Just for simplicity, I run only one map task for such as 256mb, then I
>>set
>> >my io.sort.memory to more than 512mb to make sure all input can stay in
>> >memory, I also check the log to make sure there is just on spill happen
>> >for
>> >flushing.
>> >
>> >So I think the different part run one by one, but the cpu utilization
>>is
>> >out of my expect.
>> >
>> >
>> >2013/5/9 牛兆捷 <nzjemail@gmail.com>
>> >
>> >> I have enough memory, so there will be only one sort and spill. Why
>>do
>> >> they will happen parallel?
>> >>
>> >>
>> >> 2013/5/9 Robert Evans <evans@yahoo-inc.com>
>> >>
>> >>> Yes it all happens in parallel even on a single task
>> >>>
>> >>> On 5/8/13 11:17 AM, "牛兆捷" <nzjemail@gmail.com> wrote:
>> >>>
>> >>> >I forget to say, for see the behavior of single task, I just run
>>one
>> >>>map
>> >>> >task for 1G input-split(I set block size to 1GB)
>> >>> >
>> >>> >
>> >>> >2013/5/9 Robert Evans <evans@yahoo-inc.com>
>> >>> >
>> >>> >> Deciding on the input split happens in the client.  Each map
>>process
>> >>> >>just
>> >>> >> opens up the input file and seeks to the appropriate offset
in
>>the
>> >>> file.
>> >>> >> At that point it reads each entry one at a time and sends it
to
>>the
>> >>>map
>> >>> >> task.  The output of the map task is placed in a buffer.  When
>>the
>> >>> >>buffer
>> >>> >> gets close to full the data is sorted and spilled out to disk
in
>> >>> >>parallel
>> >>> >> with the map task still running.  It is hard to get CPU time
for
>>the
>> >>> >> different parts because they are all happening in parallel.
If
>>you
>> >>>do
>> >>> >>have
>> >>> >> enough ram to store the entire output in memory and you have
>> >>>configured
>> >>> >> your sort buffer to be able to hold it all then you will probably
>> >>>only
>> >>> >> sort/spill once.
>> >>> >>
>> >>> >> --Bobby
>> >>> >>
>> >>> >> On 5/8/13 10:25 AM, "牛兆捷" <nzjemail@gmail.com>
wrote:
>> >>> >>
>> >>> >> >I saw the application container log to trace the map-reduce
>> >>> >>application.
>> >>> >> >
>> >>> >> >For map task, I find there are mainly 3 phase: spilit input,
>>sort
>> >>>and
>> >>> >> >spill
>> >>> >> >out.
>> >>> >> >I set the enough memory to make sure the input can stay
in
>>memory.
>> >>> >> >
>> >>> >> >Initially, I thought the highest cpu utilization will appear
in
>> >>>sort
>> >>> >>phase
>> >>> >> >because the other two phase focus on IO,however, it doesn't
>>behave
>> >>>as
>> >>> >>what
>> >>> >> >I thought. On the contrary, the cpu utilization during
 the
>>other
>> >>> phase
>> >>> >> >are
>> >>> >> >higher.
>> >>> >> >
>> >>> >> >Anyone know the reason?
>> >>> >> >
>> >>> >> >--
>> >>> >> >*Sincerely,*
>> >>> >> >*Zhaojie*
>> >>> >> >*
>> >>> >> >*
>> >>> >>
>> >>> >>
>> >>> >
>> >>> >
>> >>> >--
>> >>> >*Sincerely,*
>> >>> >*Zhaojie*
>> >>> >*
>> >>> >*
>> >>>
>> >>>
>> >>
>> >>
>> >> --
>> >> *Sincerely,*
>> >> *Zhaojie*
>> >> *
>> >> *
>> >>
>> >
>> >
>> >
>> >--
>> >*Sincerely,*
>> >*Zhaojie*
>> >*
>> >*
>>
>>
>
>
>-- 
>*Sincerely,*
>*Zhaojie*
>*
>*


Mime
View raw message