Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of secsubs@gmail.com designates
 209.85.220.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CACG-F-vd5GjXnEC1LY2U8pjDUb77YZbkKWW0R2jpKPyStEw9aQ@mail.gmail.com>
References: 
 <CADPi3fhdLAkHmDk3nOQuNV17POyObWMsvmTo4BaOoRf0YKfkGg@mail.gmail.com>
	<CACG-F-vAqs18fHyk1TpNSd5dRBEcfK84oQYvtgohzF5DTOYgUA@mail.gmail.com>
	<CADPi3fgOKbO26S7tHhp_oRxe0us6-HQMNJwjuJx2PpgBZbdqxA@mail.gmail.com>
	<CACG-F-vd5GjXnEC1LY2U8pjDUb77YZbkKWW0R2jpKPyStEw9aQ@mail.gmail.com>
Date: Thu, 10 Oct 2013 13:50:46 -0700
Message-ID: 
 <CADPi3fh40Lt+KKhtu7dZbvaxhT6-rJHQzmviwzTvk=DqmZ0Evw@mail.gmail.com>
Subject: Re: Improving MR job disk IO
From: Xuri Nagarin <secsubs@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=089e0149c3cefd0a8c04e8692749

--089e0149c3cefd0a8c04e8692749
Content-Type: text/plain; charset=ISO-8859-1

On Thu, Oct 10, 2013 at 1:27 PM, Pradeep Gollakota <pradeepg26@gmail.com>wrote:

> I don't think it necessarily means that the job is a bad candidate for MR.
> It's a different type of a workload. Hortonworks has a great article on the
> different types of workloads you might see and how that affects your
> provisioning choices at
> http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.2/bk_cluster-planning-guide/content/ch_hardware-recommendations.html
>

One statement that stood out to me in the link above is "For these reasons,
Hortonworks recommends that you either use the Balanced workload
configuration or invest in a pilot Hadoop cluster and plan to evolve as you
analyze the workload patterns in your environment."

Now, this is not a critique/concern of HW but rather of hadoop. Well, what
if my workloads can be both CPU and IO intensive? Do I take the approach of
throw-enough-excess-hardware-just-in-case?


>
> I have not looked at the Grep code so I'm not sure why it's behaving the
> way it is. Still curious that streaming has a higher IO throughput and
> lower CPU usage. It may have to do with the fact that /bin/grep is a native
> implementation and Grep (Hadoop) is probably using Java Pattern/Matcher api.
>

The Grep code is from the bundled examples in CDH. I made one line
modification for it to read Sequence files. The streaming job probably does
not have lower CPU utilization but I see that it does even out the CPU
utilization among all the available processors. I guess the native grep
binary threads better than the java MR job?

Which brings me to ask - If you have the mapper/reducer functionality built
into a platform specific binary, then won't it always be more efficient
than a java MR job? And, in such cases, am I better off with streaming than
Java MR?

Thanks for your responses.


>
>
> On Thu, Oct 10, 2013 at 12:29 PM, Xuri Nagarin <secsubs@gmail.com> wrote:
>
>> Thanks Pradeep. Does it mean this job is a bad candidate for MR?
>>
>> Interestingly, running the cmdline '/bin/grep' under a streaming job
>> provides (1) Much better disk throughput and, (2) CPU load is almost evenly
>> spread across all cores/threads (no CPU gets pegged to 100%).
>>
>>
>>
>>
>> On Thu, Oct 10, 2013 at 11:15 AM, Pradeep Gollakota <pradeepg26@gmail.com
>> > wrote:
>>
>>> Actually... I believe that is expected behavior. Since your CPU is
>>> pegged at 100% you're not going to be IO bound. Typically jobs tend to be
>>> CPU bound or IO bound. If you're CPU bound you expect to see low IO
>>> throughput. If you're IO bound, you expect to see low CPU usage.
>>>
>>>
>>> On Thu, Oct 10, 2013 at 11:05 AM, Xuri Nagarin <secsubs@gmail.com>wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a simple Grep job (from bundled examples) that I am running on a
>>>> 11-node cluster. Each node is 2x8-core Intel Xeons (shows 32 CPUs with HT
>>>> on), 64GB RAM and 8 x 1TB disks. I have mappers set to 20 per node.
>>>>
>>>> When I run the Grep job, I notice that CPU gets pegged to 100% on
>>>> multiple cores but disk throughput remains a dismal 1-2 Mbytes/sec on a
>>>> single disk on each node. So I guess, the cluster is poorly performing in
>>>> terms of disk IO. Running Terasort, I see each disk puts out 25-35
>>>> Mbytes/sec with a total cluster throughput of above 1.5 Gbytes/sec.
>>>>
>>>> How do I go about re-configuring or re-writing the job to utilize
>>>> maximum disk IO?
>>>>
>>>> TIA,
>>>>
>>>> Xuri
>>>>
>>>>
>>>>
>>>
>>
>

--089e0149c3cefd0a8c04e8692749
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On Thu, Oct 10, 2013 at 1:27 PM, Pradeep Gollakota <span d=
ir=3D"ltr">&lt;<a href=3D"mailto:pradeepg26@gmail.com" target=3D"_blank">pr=
adeepg26@gmail.com</a>&gt;</span> wrote:<br><div class=3D"gmail_extra"><div=
 class=3D"gmail_quote">
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex"><div dir=3D"ltr">I don&#39;t think it necessarily means th=
at the job is a bad candidate for MR. It&#39;s a different type of a worklo=
ad. Hortonworks has a great article on the different types of workloads you=
 might see and how that affects your provisioning choices at=A0<a href=3D"h=
ttp://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.2/bk_cluster-planning-=
guide/content/ch_hardware-recommendations.html" target=3D"_blank">http://do=
cs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.2/bk_cluster-planning-guide/co=
ntent/ch_hardware-recommendations.html</a></div>
</blockquote><div><br></div><div>One statement that stood out to me in the =
link above is &quot;<span style=3D"color:rgb(0,0,0);font-family:Verdana,Gen=
eva,sans-serif;font-size:13px">For these reasons, Hortonworks recommends th=
at you either use the Balanced workload configuration or invest in a pilot =
Hadoop cluster and plan to evolve as you analyze the workload patterns in y=
our environment.&quot;</span></div>
<div><span style=3D"color:rgb(0,0,0);font-family:Verdana,Geneva,sans-serif;=
font-size:13px"><br></span></div><div><span style=3D"color:rgb(0,0,0);font-=
family:Verdana,Geneva,sans-serif;font-size:13px">Now, this is not a critiqu=
e/concern of HW but rather of hadoop. Well, what if my workloads can be bot=
h CPU and IO intensive? Do I take the approach of throw-enough-excess-hardw=
are-just-in-case?</span></div>
<div>=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px=
 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left=
-style:solid;padding-left:1ex"><div dir=3D"ltr"><div>
<br></div><div>I have not looked at the Grep code so I&#39;m not sure why i=
t&#39;s behaving the way it is. Still curious that streaming has a higher I=
O throughput and lower CPU usage. It may have to do with the fact that /bin=
/grep is a native implementation and Grep (Hadoop) is probably using Java P=
attern/Matcher api.</div>
</div></blockquote><div><br></div><div>The Grep code is from the bundled ex=
amples in CDH. I made one line modification for it to read Sequence files. =
The streaming job probably does not have lower CPU utilization but I see th=
at it does even out the CPU utilization among all the available processors.=
 I guess the native grep binary threads better than the java MR job?</div>
<div><br></div><div>Which brings me to ask - If you have the mapper/reducer=
 functionality built into a platform specific binary, then won&#39;t it alw=
ays be more efficient than a java MR job? And, in such cases, am I better o=
ff with streaming than Java MR?</div>
<div><br></div><div>Thanks for your responses.</div><div><br></div><div><br=
></div><div>=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px =
0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);bord=
er-left-style:solid;padding-left:1ex">
<div dir=3D"ltr">
</div><div class=3D""><div class=3D"h5"><div class=3D"gmail_extra"><br><br>=
<div class=3D"gmail_quote">On Thu, Oct 10, 2013 at 12:29 PM, Xuri Nagarin <=
span dir=3D"ltr">&lt;<a href=3D"mailto:secsubs@gmail.com" target=3D"_blank"=
>secsubs@gmail.com</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex"><div dir=3D"ltr">Thanks Pradeep. Does it mean this job is =
a bad candidate for MR?<div>
<br></div><div>Interestingly, running the cmdline &#39;/bin/grep&#39; under=
 a streaming job provides (1) Much better disk throughput and, (2) CPU load=
 is almost evenly spread across all cores/threads (no CPU gets pegged to 10=
0%).</div>


<div><br></div><div><br></div></div><div><div><div class=3D"gmail_extra"><b=
r><br><div class=3D"gmail_quote">On Thu, Oct 10, 2013 at 11:15 AM, Pradeep =
Gollakota <span dir=3D"ltr">&lt;<a href=3D"mailto:pradeepg26@gmail.com" tar=
get=3D"_blank">pradeepg26@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex"><div dir=3D"ltr">Actually... I believe that is expected be=
havior. Since your CPU is pegged at 100% you&#39;re not going to be IO boun=
d. Typically jobs tend to be CPU bound or IO bound. If you&#39;re CPU bound=
 you expect to see low IO throughput. If you&#39;re IO bound, you expect to=
 see low CPU usage.<br>


</div><div><div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quot=
e">On Thu, Oct 10, 2013 at 11:05 AM, Xuri Nagarin <span dir=3D"ltr">&lt;<a =
href=3D"mailto:secsubs@gmail.com" target=3D"_blank">secsubs@gmail.com</a>&g=
t;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex"><div dir=3D"ltr">Hi,<div><br></div><div>I have a simple Gr=
ep job (from bundled examples) that I am running on a 11-node cluster. Each=
 node is 2x8-core Intel Xeons (shows 32 CPUs with HT on), 64GB RAM and 8 x =
1TB disks. I have mappers set to 20 per node.</div>


<div><br></div><div>When I run the Grep job, I notice that CPU gets pegged =
to 100% on multiple cores but disk throughput remains a dismal 1-2 Mbytes/s=
ec on a single disk on each node. So I guess, the cluster is poorly perform=
ing in terms of disk IO. Running Terasort, I see each disk puts out 25-35 M=
bytes/sec with a total cluster throughput of above 1.5 Gbytes/sec.=A0</div>


<div><br></div><div>How do I go about re-configuring or re-writing the job =
to utilize maximum disk IO?</div><div><br></div><div>TIA,</div><div><br></d=
iv><div>Xuri</div><div><br></div><div><br></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div></div>

--089e0149c3cefd0a8c04e8692749--